[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7112B8B4-B593-45AA-A9AD-2ABEEE96223E@fb.com>
Date: Fri, 14 Oct 2022 18:26:04 +0000
From: Song Liu <songliubraving@...a.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"song@...nel.org" <song@...nel.org>, "hch@....de" <hch@....de>,
Kernel Team <Kernel-team@...com>,
"peterz@...radead.org" <peterz@...radead.org>,
"urezki@...il.com" <urezki@...il.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"x86@...nel.org" <x86@...nel.org>,
"Hansen, Dave" <dave.hansen@...el.com>
Subject: Re: [RFC v2 3/4] modules, x86: use vmalloc_exec for module core
> On Oct 14, 2022, at 8:42 AM, Edgecombe, Rick P <rick.p.edgecombe@...el.com> wrote:
>
> On Fri, 2022-10-07 at 16:43 -0700, Song Liu wrote:
>> diff --git a/kernel/module/main.c b/kernel/module/main.c
>> index a4e4d84b6f4e..b44806e31a56 100644
>> --- a/kernel/module/main.c
>> +++ b/kernel/module/main.c
>> @@ -53,6 +53,7 @@
>> #include <linux/bsearch.h>
>> #include <linux/dynamic_debug.h>
>> #include <linux/audit.h>
>> +#include <linux/bpf.h>
>> #include <uapi/linux/module.h>
>> #include "internal.h"
>>
>> @@ -1203,7 +1204,7 @@ static void free_module(struct module *mod)
>> lockdep_free_key_range(mod->data_layout.base, mod-
>>> data_layout.size);
>>
>> /* Finally, free the core (containing the module structure)
>> */
>> - module_memfree(mod->core_layout.base);
>> + vfree_exec(mod->core_layout.base);
>> #ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC
>> vfree(mod->data_layout.base);
>> #endif
>> @@ -1321,7 +1322,8 @@ static int simplify_symbols(struct module *mod,
>> const struct load_info *info)
>> ksym = resolve_symbol_wait(mod, info, name);
>> /* Ok if resolved. */
>> if (ksym && !IS_ERR(ksym)) {
>> - sym[i].st_value =
>> kernel_symbol_value(ksym);
>> + unsigned long val =
>> kernel_symbol_value(ksym);
>> + bpf_arch_text_copy(&sym[i].st_value,
>> &val, sizeof(val));
>
> Why bpf_arch_text_copy()? This of course won't work for other
> architectures. So there needs to be fallback method. That RFC broke the
> operation into two stages: Loading and finalized. When loading, on non-
> x86 the writes would simply be to the allocation mapped as writable.
> When it was finalized it changed it to it's final permission (RO, etc).
> Then for x86 it does text_pokes() for the writes and has it RO from the
> beginning.
Yeah, this one (3/4) is really a prototype to show vmalloc_exec could
work for modules (with a lot more work of course). And something to
replace bpf_arch_text_copy() is one of the issues we need to address in
the future.
>
> I ended up needing a staging buffer for modules too, so that the code
> could operate on it directly. I can't remember why that was, it might
> be unneeded now since you moved data out of the core allocation.
Both bpf_jit and bpf_dispather uses a staging buffer with bpf_prog_pack.
The benefit of this approach is that it minimizes the number of
text_poke/copy() calls. OTOH, it is quite a pain to make all the
relative calls correct, as the staging buffer has different address to
the final allocation.
I think we may not need the staging buffer for modules, as module
load/unload happens less often than BPF program JITs (so it is ok for
it to be slightly slower).
btw: I cannot take credit for split module data out of core allocation,
Christophe Leroy did the work. :)
Thanks,
Song
Powered by blists - more mailing lists