lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dc23afb892846ef41d73a41d58c07f6620cb6312.camel@intel.com>
Date:   Tue, 17 May 2022 23:58:43 +0000
From:   "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To:     "songliubraving@...com" <songliubraving@...com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "ast@...nel.org" <ast@...nel.org>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "Torvalds, Linus" <torvalds@...ux-foundation.org>,
        "Kernel-team@...com" <Kernel-team@...com>,
        "song@...nel.org" <song@...nel.org>,
        "mcgrof@...nel.org" <mcgrof@...nel.org>
Subject: Re: [PATCH bpf-next 5/5] bpf: use module_alloc_huge for bpf_prog_pack

On Tue, 2022-05-17 at 21:08 +0000, Song Liu wrote:
> > On May 17, 2022, at 12:15 PM, Edgecombe, Rick P <
> > rick.p.edgecombe@...el.com> wrote:
> > 
> > On Sun, 2022-05-15 at 22:40 -0700, Song Liu wrote:
> > > Use module_alloc_huge for bpf_prog_pack so that BPF programs sit
> > > on
> > > PMD_SIZE pages. This benefits system performance by reducing iTLB
> > > miss
> > > rate. Benchmark of a real web service workload shows this change
> > > gives
> > > another ~0.2% performance boost on top of PAGE_SIZE bpf_prog_pack
> > > (which improve system throughput by ~0.5%).
> > 
> > 0.7% sounds good as a whole. How sure are you of that +0.2%? Was
> > this a
> > big averaged test?
> 
> Yes, this was a test between two tiers with 10+ servers on each
> tier.  
> We took the average performance over a few hours of shadow workload. 

Awesome. Sounds great.

> 
> > 
> > > 
> > > Also, remove set_vm_flush_reset_perms() from alloc_new_pack() and
> > > use
> > > set_memory_[nx|rw] in bpf_prog_pack_free(). This is because
> > > VM_FLUSH_RESET_PERMS does not work with huge pages yet. [1]
> > > 
> > > [1] 
> > > 
https://lore.kernel.org/bpf/aeeeaf0b7ec63fdba55d4834d2f524d8bf05b71b.camel@intel.com/
> > > Suggested-by: Rick Edgecombe <rick.p.edgecombe@...el.com>
> > 
> > As I said before, I think this will work functionally. But I meant
> > it
> > as a quick fix when we were talking about patching this up to keep
> > it
> > enabled upstream.
> > 
> > So now, should we make VM_FLUSH_RESET_PERMS work properly with huge
> > pages? The main benefit would be to keep the tear down of these
> > types
> > of allocations consistent for correctness reasons. The TLB flush
> > minimizing differences are probably less impactful given the
> > caching
> > introduced here. At the very least though, we should have (or have
> > already had) some WARN if people try to use it with huge pages.
> 
> I am not quite sure the exact work needed here. Rick, would you have
> time to enable VM_FLUSH_RESET_PERMS for huge pages? Given the merge 
> window is coming soon, I guess we need current work around in 5.19. 

I would have hard time squeezing that in now. The vmalloc part is easy,
I think I already posted a diff. But first hibernate needs to be
changed to not care about direct map page sizes.

> 
> > 
> > > Signed-off-by: Song Liu <song@...nel.org>
> > > ---
> > > kernel/bpf/core.c | 12 +++++++-----
> > > 1 file changed, 7 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > index cacd8684c3c4..b64d91fcb0ba 100644
> > > --- a/kernel/bpf/core.c
> > > +++ b/kernel/bpf/core.c
> > > @@ -857,7 +857,7 @@ static size_t select_bpf_prog_pack_size(void)
> > >       void *ptr;
> > > 
> > >       size = BPF_HPAGE_SIZE * num_online_nodes();
> > > -    ptr = module_alloc(size);
> > > +    ptr = module_alloc_huge(size);
> > 
> > This select_bpf_prog_pack_size() function always seemed weird -
> > doing a
> > big allocation and then immediately freeing. Can't it check a
> > config
> > for vmalloc huge page support?
> 
> Yes, it is weird. Checking a config is not enough here. We also need
> to 
> check vmap_allow_huge, which is controlled by boot parameter
> nohugeiomap. 
> I haven’t got a better solution for this. 

It's too weird. We should expose whats needed in vmalloc.
huge_vmalloc_supported() or something.

I'm also not clear why we wouldn't want to use the prog pack allocator
even if vmalloc huge pages was disabled. Doesn't it improve performance
even with small page sizes, per your benchmarks? What is the downside
to just always using it?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ