[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68615225-D09D-465A-8EEC-6F81EF074854@fb.com>
Date: Tue, 17 May 2022 21:08:16 +0000
From: Song Liu <songliubraving@...com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"song@...nel.org" <song@...nel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"Torvalds, Linus" <torvalds@...ux-foundation.org>,
"peterz@...radead.org" <peterz@...radead.org>,
Kernel Team <Kernel-team@...com>,
"ast@...nel.org" <ast@...nel.org>,
"mcgrof@...nel.org" <mcgrof@...nel.org>
Subject: Re: [PATCH bpf-next 5/5] bpf: use module_alloc_huge for bpf_prog_pack
> On May 17, 2022, at 12:15 PM, Edgecombe, Rick P <rick.p.edgecombe@...el.com> wrote:
>
> On Sun, 2022-05-15 at 22:40 -0700, Song Liu wrote:
>> Use module_alloc_huge for bpf_prog_pack so that BPF programs sit on
>> PMD_SIZE pages. This benefits system performance by reducing iTLB
>> miss
>> rate. Benchmark of a real web service workload shows this change
>> gives
>> another ~0.2% performance boost on top of PAGE_SIZE bpf_prog_pack
>> (which improve system throughput by ~0.5%).
>
> 0.7% sounds good as a whole. How sure are you of that +0.2%? Was this a
> big averaged test?
Yes, this was a test between two tiers with 10+ servers on each tier.
We took the average performance over a few hours of shadow workload.
>
>>
>> Also, remove set_vm_flush_reset_perms() from alloc_new_pack() and use
>> set_memory_[nx|rw] in bpf_prog_pack_free(). This is because
>> VM_FLUSH_RESET_PERMS does not work with huge pages yet. [1]
>>
>> [1]
>> https://lore.kernel.org/bpf/aeeeaf0b7ec63fdba55d4834d2f524d8bf05b71b.camel@intel.com/
>> Suggested-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
>
> As I said before, I think this will work functionally. But I meant it
> as a quick fix when we were talking about patching this up to keep it
> enabled upstream.
>
> So now, should we make VM_FLUSH_RESET_PERMS work properly with huge
> pages? The main benefit would be to keep the tear down of these types
> of allocations consistent for correctness reasons. The TLB flush
> minimizing differences are probably less impactful given the caching
> introduced here. At the very least though, we should have (or have
> already had) some WARN if people try to use it with huge pages.
I am not quite sure the exact work needed here. Rick, would you have
time to enable VM_FLUSH_RESET_PERMS for huge pages? Given the merge
window is coming soon, I guess we need current work around in 5.19.
>
>> Signed-off-by: Song Liu <song@...nel.org>
>> ---
>> kernel/bpf/core.c | 12 +++++++-----
>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index cacd8684c3c4..b64d91fcb0ba 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void)
>> void *ptr;
>>
>> size = BPF_HPAGE_SIZE * num_online_nodes();
>> - ptr = module_alloc(size);
>> + ptr = module_alloc_huge(size);
>
> This select_bpf_prog_pack_size() function always seemed weird - doing a
> big allocation and then immediately freeing. Can't it check a config
> for vmalloc huge page support?
Yes, it is weird. Checking a config is not enough here. We also need to
check vmap_allow_huge, which is controlled by boot parameter nohugeiomap.
I haven’t got a better solution for this.
Thanks,
Song
>
>>
>> /* Test whether we can get huge pages. If not just use
>> PAGE_SIZE
>> * packs.
>> @@ -881,7 +881,7 @@ static struct bpf_prog_pack
>> *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins
>> GFP_KERNEL);
>> if (!pack)
>> return NULL;
>> - pack->ptr = module_alloc(bpf_prog_pack_size);
>> + pack->ptr = module_alloc_huge(bpf_prog_pack_size);
>> if (!pack->ptr) {
>> kfree(pack);
>> return NULL;
>> @@ -890,7 +890,6 @@ static struct bpf_prog_pack
>> *alloc_new_pack(bpf_jit_fill_hole_t bpf_fill_ill_ins
>> bitmap_zero(pack->bitmap, bpf_prog_pack_size /
>> BPF_PROG_CHUNK_SIZE);
>> list_add_tail(&pack->list, &pack_list);
>>
>> - set_vm_flush_reset_perms(pack->ptr);
>> set_memory_ro((unsigned long)pack->ptr, bpf_prog_pack_size /
>> PAGE_SIZE);
>> set_memory_x((unsigned long)pack->ptr, bpf_prog_pack_size /
>> PAGE_SIZE);
>> return pack;
>> @@ -909,10 +908,9 @@ static void *bpf_prog_pack_alloc(u32 size,
>> bpf_jit_fill_hole_t bpf_fill_ill_insn
>>
>> if (size > bpf_prog_pack_size) {
>> size = round_up(size, PAGE_SIZE);
>> - ptr = module_alloc(size);
>> + ptr = module_alloc_huge(size);
>> if (ptr) {
>> bpf_fill_ill_insns(ptr, size);
>> - set_vm_flush_reset_perms(ptr);
>> set_memory_ro((unsigned long)ptr, size /
>> PAGE_SIZE);
>> set_memory_x((unsigned long)ptr, size /
>> PAGE_SIZE);
>> }
>> @@ -949,6 +947,8 @@ static void bpf_prog_pack_free(struct
>> bpf_binary_header *hdr)
>>
>> mutex_lock(&pack_mutex);
>> if (hdr->size > bpf_prog_pack_size) {
>> + set_memory_nx((unsigned long)hdr, hdr->size /
>> PAGE_SIZE);
>> + set_memory_rw((unsigned long)hdr, hdr->size /
>> PAGE_SIZE);
>> module_memfree(hdr);
>> goto out;
>> }
>> @@ -975,6 +975,8 @@ static void bpf_prog_pack_free(struct
>> bpf_binary_header *hdr)
>> if (bitmap_find_next_zero_area(pack->bitmap,
>> bpf_prog_chunk_count(), 0,
>> bpf_prog_chunk_count(), 0) == 0)
>> {
>> list_del(&pack->list);
>> + set_memory_nx((unsigned long)pack->ptr,
>> bpf_prog_pack_size / PAGE_SIZE);
>> + set_memory_rw((unsigned long)pack->ptr,
>> bpf_prog_pack_size / PAGE_SIZE);
>> module_memfree(pack->ptr);
>> kfree(pack);
>> }
Powered by blists - more mailing lists