[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wgA1Uku=ejwknv11ssNhz2pswhD=mJFBPEMQtCspz0YEQ@mail.gmail.com>
Date: Wed, 27 Apr 2022 18:45:25 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Song Liu <songliubraving@...com>
Cc: bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <Kernel-team@...com>,
Christoph Hellwig <hch@...radead.org>,
Peter Zijlstra <peterz@...radead.org>,
Song Liu <song@...nel.org>
Subject: Re: [PATCH bpf v2 0/3] bpf: invalidate unused part of bpf_prog_pack
On Wed, Apr 27, 2022 at 3:24 PM Song Liu <songliubraving@...com> wrote:
>
> Could you please share your suggestions on this set? Shall we ship it
> with 5.18?
I'd personally prefer to just not do the prog_pack thing at all, since
I don't think it was actually in a "ready to ship" state for this
merge window, and the hugepage mapping protection games I'm still
leery of.
Yes, the hugepage protection things probably do work from what I saw
when I looked through them, but that x86 vmalloc hugepage code was
really designed for another use (non-refcounted device pages), so the
fact that it all actually seems surprisingly ok certainly wasn't
because the code was designed to do that new case.
Does the prog_pack thing work with small pages?
Yes. But that wasn't what it was designed for or its selling point, so
it all is a bit suspect to me.
And I'm looking at those set_memory_xyz() calls, and I'm going "yeah,
I think it works on x86, but on ppc I look at it and I see
static inline int set_memory_ro(unsigned long addr, int numpages)
{
return change_memory_attr(addr, numpages, SET_MEMORY_RO);
}
and then in change_memory_attr() it does
if (WARN_ON_ONCE(is_vmalloc_or_module_addr((void *)addr) &&
is_vm_area_hugepages((void *)addr)))
return -EINVAL;
and I'm "this is all supposedly generic code, but I'm not seeing how
it works cross-architecture".
I *think* it's actually because this is all basically x86-specific due
to x86 being the only architecture that implements
bpf_arch_text_copy(), and everybody else then ends up erroring out and
not using the prog_pack thing after all.
And then one of the two places that use bpf_arch_text_copy() doesn't
even check the returned error code.
So this all ends up just depending on "only x86 will actually succeed
in bpf_jit_binary_pack_finalize(), everybody else will fail after
having done all the common setup".
End result: it all seems a bit broken right now. The "generic" code
only works on x86, and on other architectures it goes through the
motions and then fails at the end. And even on x86 I worry about
actually enabling it fully.
I'm not having the warm and fuzzies about this all, in other words.
Maybe people can convince me otherwise, but I think you need to work at it.
And even for 5.19+ kind of timeframes, I'd actually like the x86
people who maintain arch/x86/mm/pat/set_memory.c also sign off on
using that code for hugepage vmalloc mappings too.
I *think* it does. But that code has various subtle things going on.
I see PeterZ is cc'd (presumably because of the text_poke() stuff, not
because of the whole "call set_memory_ro() on virtually mapped kernel
largepage memory".
Did people even talk to x86 people about this, or did the whole "it
works, except it turns out set_vm_flush_reset_perms() doesn't work"
mean that the authors of that code never got involved?
Linus
Powered by blists - more mailing lists