[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230612213411.GP52412@kernel.org>
Date: Tue, 13 Jun 2023 00:34:11 +0300
From: Mike Rapoport <rppt@...nel.org>
To: Song Liu <song@...nel.org>
Cc: Mark Rutland <mark.rutland@....com>,
Kent Overstreet <kent.overstreet@...ux.dev>,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Catalin Marinas <catalin.marinas@....com>,
Christophe Leroy <christophe.leroy@...roup.eu>,
"David S. Miller" <davem@...emloft.net>,
Dinh Nguyen <dinguyen@...nel.org>,
Heiko Carstens <hca@...ux.ibm.com>,
Helge Deller <deller@....de>,
Huacai Chen <chenhuacai@...nel.org>,
Luis Chamberlain <mcgrof@...nel.org>,
Michael Ellerman <mpe@...erman.id.au>,
"Naveen N. Rao" <naveen.n.rao@...ux.ibm.com>,
Palmer Dabbelt <palmer@...belt.com>,
Russell King <linux@...linux.org.uk>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
Thomas Gleixner <tglx@...utronix.de>,
Will Deacon <will@...nel.org>, bpf@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-mips@...r.kernel.org,
linux-mm@...ck.org, linux-modules@...r.kernel.org,
linux-parisc@...r.kernel.org, linux-riscv@...ts.infradead.org,
linux-s390@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
linuxppc-dev@...ts.ozlabs.org, loongarch@...ts.linux.dev,
netdev@...r.kernel.org, sparclinux@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH 00/13] mm: jit/text allocator
On Fri, Jun 09, 2023 at 10:02:16AM -0700, Song Liu wrote:
> On Thu, Jun 8, 2023 at 11:41 AM Mike Rapoport <rppt@...nel.org> wrote:
> >
> > On Tue, Jun 06, 2023 at 11:21:59AM -0700, Song Liu wrote:
> > > On Mon, Jun 5, 2023 at 3:09 AM Mark Rutland <mark.rutland@....com> wrote:
> > >
> > > [...]
> > >
> > > > > > > Can you give more detail on what parameters you need? If the only extra
> > > > > > > parameter is just "does this allocation need to live close to kernel
> > > > > > > text", that's not that big of a deal.
> > > > > >
> > > > > > My thinking was that we at least need the start + end for each caller. That
> > > > > > might be it, tbh.
> > > > >
> > > > > Do you mean that modules will have something like
> > > > >
> > > > > jit_text_alloc(size, MODULES_START, MODULES_END);
> > > > >
> > > > > and kprobes will have
> > > > >
> > > > > jit_text_alloc(size, KPROBES_START, KPROBES_END);
> > > > > ?
> > > >
> > > > Yes.
> > >
> > > How about we start with two APIs:
> > > jit_text_alloc(size);
> > > jit_text_alloc_range(size, start, end);
> > >
> > > AFAICT, arm64 is the only arch that requires the latter API. And TBH, I am
> > > not quite convinced it is needed.
> >
> > Right now arm64 and riscv override bpf and kprobes allocations to use the
> > entire vmalloc address space, but having the ability to allocate generated
> > code outside of modules area may be useful for other architectures.
> >
> > Still the start + end for the callers feels backwards to me because the
> > callers do not define the ranges, but rather the architectures, so we still
> > need a way for architectures to define how they want allocate memory for
> > the generated code.
>
> Yeah, this makes sense.
>
> >
> > > > > It sill can be achieved with a single jit_alloc_arch_params(), just by
> > > > > adding enum jit_type parameter to jit_text_alloc().
> > > >
> > > > That feels backwards to me; it centralizes a bunch of information about
> > > > distinct users to be able to shove that into a static array, when the callsites
> > > > can pass that information.
> > >
> > > I think we only two type of users: module and everything else (ftrace, kprobe,
> > > bpf stuff). The key differences are:
> > >
> > > 1. module uses text and data; while everything else only uses text.
> > > 2. module code is generated by the compiler, and thus has stronger
> > > requirements in address ranges; everything else are generated via some
> > > JIT or manual written assembly, so they are more flexible with address
> > > ranges (in JIT, we can avoid using instructions that requires a specific
> > > address range).
> > >
> > > The next question is, can we have the two types of users share the same
> > > address ranges? If not, we can reserve the preferred range for modules,
> > > and let everything else use the other range. I don't see reasons to further
> > > separate users in the "everything else" group.
> >
> > I agree that we can define only two types: modules and everything else and
> > let the architectures define if they need different ranges for these two
> > types, or want the same range for everything.
> >
> > With only two types we can have two API calls for alloc, and a single
> > structure that defines the ranges etc from the architecture side rather
> > than spread all over.
> >
> > Like something along these lines:
> >
> > struct execmem_range {
> > unsigned long start;
> > unsigned long end;
> > unsigned long fallback_start;
> > unsigned long fallback_end;
> > pgprot_t pgprot;
> > unsigned int alignment;
> > };
> >
> > struct execmem_modules_range {
> > enum execmem_module_flags flags;
> > struct execmem_range text;
> > struct execmem_range data;
> > };
> >
> > struct execmem_jit_range {
> > struct execmem_range text;
> > };
> >
> > struct execmem_params {
> > struct execmem_modules_range modules;
> > struct execmem_jit_range jit;
> > };
> >
> > struct execmem_params *execmem_arch_params(void);
> >
> > void *execmem_text_alloc(size_t size);
> > void *execmem_data_alloc(size_t size);
> > void execmem_free(void *ptr);
>
> With the jit variation, maybe we can just call these
> module_[text|data]_alloc()?
I was thinking about "execmem_*_alloc()" for allocations that must be close to kernel
image, like modules, ftrace on x86 and s390 and maybe something else in the
future.
And jit_text_alloc() for allocations that can reside anywhere.
I tried to find a different name for 'struct execmem_modules_range' but
couldn't think of anything better than 'struct execmem_close_to_kernel', so
I've left modules in the name.
> btw: Depending on the implementation of the allocator, we may also
> need separate free()s for text and data.
>
> >
> > void *jit_text_alloc(size_t size);
> > void jit_free(void *ptr);
> >
Let's just add jit_free() for completeness even if it will be the same as
execmem_free() for now.
> [...]
>
> How should we move ahead from here?
>
> AFAICT, all these changes can be easily extended and refactored
> in the future, so we don't have to make it perfect the first time.
> OTOH, having the interface committed (either this set or my
> module_alloc_type version) can unblock works in the binpack
> allocator and the users side. Therefore, I think we can move
> relatively fast here?
Once the interface and architecture abstraction is ready we can work on the
allocator and the users. We also need to update text_poking/alternatives on
architectures that would allocate executable memory as ROX. I did some
quick tests and with these patches 'modprobe xfs' takes tens time more than
before.
> Thanks,
> Song
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists