[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8C89EED6-6531-4144-BDA8-B6519396F67C@fb.com>
Date: Thu, 19 May 2022 18:25:39 +0000
From: Song Liu <songliubraving@...com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"peterz@...radead.org" <peterz@...radead.org>,
"ast@...nel.org" <ast@...nel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"Torvalds, Linus" <torvalds@...ux-foundation.org>,
Kernel Team <Kernel-team@...com>,
"song@...nel.org" <song@...nel.org>,
"mcgrof@...nel.org" <mcgrof@...nel.org>
Subject: Re: [PATCH bpf-next 5/5] bpf: use module_alloc_huge for bpf_prog_pack
> On May 19, 2022, at 9:56 AM, Edgecombe, Rick P <rick.p.edgecombe@...el.com> wrote:
>
> On Thu, 2022-05-19 at 06:42 +0000, Song Liu wrote:
>> Thinking more on this. Even huge page is not supported, we can
>> allocate
>> 2MB worth of 4kB pages and keep using it. This would help direct map
>> fragmentation. And the code would also be simpler.
>>
>> Rick, I guess this is inline with some of your ideas?
>
> Yea, that is what I wondering. Potential benefits are just speculative
> though. There is a memory overhead cost, so it's not free.
Yeah, I had the same concern with memory overhead. The benefit should
not exceed 0.2% (when 2MB page is supported), so I think we can use
4kB pages for now.
>
> As for the other question of whether to fix VM_FLUSH_RESET_PERMS. If
> there really is an intention to create a more general module_alloc()
> replacement soon, then I think it is ok to side step it. An optimal
> replacement might not need it and it could be removed in that case.
> Let's at least add a WARN about it not working with huge pages though.
IIUC, it will take some effort to let kernel modules use a solution
like this. But there are some low hanging fruits, like ftrace and BPF
trampolines. Maybe that's enough to promote a more general solution.
For the WARN, I guess we need something like this?
diff --git i/include/linux/vmalloc.h w/include/linux/vmalloc.h
index b159c2789961..5e0d0a60d9d5 100644
--- i/include/linux/vmalloc.h
+++ w/include/linux/vmalloc.h
@@ -238,6 +238,7 @@ static inline void set_vm_flush_reset_perms(void *addr)
{
struct vm_struct *vm = find_vm_area(addr);
+ WARN_ON_ONCE(is_vm_area_hugepages(addr));
if (vm)
vm->flags |= VM_FLUSH_RESET_PERMS;
}
Thanks,
Song
>
> I also think the benchmarking so far is not sufficient to make the case
> that huge page mappings help your workload since the direct map splits
> were also different between the tests. I was expecting it to help
> though. Others were the ones that asked for that, so just commenting my
> analysis here.
Powered by blists - more mailing lists