lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 19 Apr 2022 19:03:11 -0700
From:   Alexei Starovoitov <alexei.starovoitov@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Mike Rapoport <rppt@...nel.org>, Song Liu <songliubraving@...com>,
        "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>,
        "mcgrof@...nel.org" <mcgrof@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "hch@...radead.org" <hch@...radead.org>,
        "ast@...nel.org" <ast@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "song@...nel.org" <song@...nel.org>,
        Kernel Team <Kernel-team@...com>,
        "pmladek@...e.com" <pmladek@...e.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "hpa@...or.com" <hpa@...or.com>,
        "dborkman@...hat.com" <dborkman@...hat.com>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "bp@...en8.de" <bp@...en8.de>, "mbenes@...e.cz" <mbenes@...e.cz>,
        "imbrenda@...ux.ibm.com" <imbrenda@...ux.ibm.com>
Subject: Re: [PATCH v4 bpf 0/4] vmalloc: bpf: introduce VM_ALLOW_HUGE_VMAP

On Tue, Apr 19, 2022 at 12:20:39PM -0700, Linus Torvalds wrote:
> On Tue, Apr 19, 2022 at 11:42 AM Mike Rapoport <rppt@...nel.org> wrote:
> >
> > I'd say that bpf_prog_pack was a cure for symptoms and this project tries
> > to address more general problem.
> > But you are right, it'll take some time and won't land in 5.19.
> 
> Just to update people: I've just applied Song's [1/4] patch, which
> means that the whole current hugepage vmalloc thing is effectively
> disabled (because nothing opts in).
> 
> And I suspect that will be the status for 5.18, unless somebody comes
> up with some very strong arguments for (re-)starting using huge pages.

Here is the quote from Song's cover letter for bpf_prog_pack series:

  Most BPF programs are small, but they consume a page each. For systems
  with busy traffic and many BPF programs, this could also add significant
  pressure to instruction TLB. High iTLB pressure usually causes slow down
  for the whole system, which includes visible performance degradation for
  production workloads.

The last sentence is the key. We've added this feature not because of bpf
programs themselves. So calling this feature an optimization is not quite
correct. The number of bpf programs on the production server doesn't matter.
The programs come and go all the time. That is the key here.  The 4k
module_alloc() plus set_memory_ro/x done by the JIT break down huge pages and
increase TLB pressure on the kernel code. That creates visible performance
degradation for normal user space workloads that are not doing anything bpf
related. mm folks can fill in the details here. My understanding it's
something to do with identity mapping.
So we're not trying to improve bpf performance. We're trying to make
sure that bpf program load/unload doesn't affect the speed of the kernel.
Generalizing bpf_prog_alloc to modules would be nice, but it's not clear
what benefits such optimization might have. It's orthogonal here.

So I argue that all 4 Song's fixes are necessary in 5.18.
We need an additional zeroing patch too, of course, to make sure huge page
doesn't have garbage at alloc time and it's cleaned after prog is unloaded.

Regarding JIT spraying and other concerns. Short answer: nothing changed.
JIT spraying was mitigated with start address randomization and invalid
instruction padding. Both features are still present.
Constant blinding is also fully functional.

Any kind of generalization of bpf_prog_pack into general mm feature would be
nice, but it cannot be done as opportunistic cache. We need a guarantee that
bpf prog/unload won't recreate the issue with kernel performance degradation. I
suspect we would need bpf_prog_pack in the current form for foreseeable future.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ