lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250327145159.99799-1-alexei.starovoitov@gmail.com>
Date: Thu, 27 Mar 2025 10:51:59 -0400
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: torvalds@...ux-foundation.org
Cc: bpf@...r.kernel.org,
	daniel@...earbox.net,
	andrii@...nel.org,
	martin.lau@...nel.org,
	akpm@...ux-foundation.org,
	peterz@...radead.org,
	vbabka@...e.cz,
	bigeasy@...utronix.de,
	rostedt@...dmis.org,
	mhocko@...e.com,
	shakeel.butt@...ux.dev,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: [GIT PULL] Introduce try_alloc_pages for 6.15

Hi Linus,

The following changes since commit 2014c95afecee3e76ca4a56956a936e23283f05b:

  Linux 6.14-rc1 (2025-02-02 15:39:26 -0800)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git tags/bpf_try_alloc_pages

for you to fetch changes up to f90b474a35744b5d43009e4fab232e74a3024cae:

  mm: Fix the flipped condition in gfpflags_allow_spinning() (2025-03-15 11:18:19 -0700)

----------------------------------------------------------------
Please pull after main MM changes.

The pull includes work from Sebastian, Vlastimil and myself
with a lot of help from Michal and Shakeel.
This is a first step towards making kmalloc reentrant to get rid
of slab wrappers: bpf_mem_alloc, kretprobe's objpool, etc.
These patches make page allocator safe from any context.

Vlastimil kicked off this effort at LSFMM 2024:
https://lwn.net/Articles/974138/
and we continued at LSFMM 2025:
https://lore.kernel.org/all/CAADnVQKfkGxudNUkcPJgwe3nTZ=xohnRshx9kLZBTmR_E1DFEg@mail.gmail.com/

Why:
SLAB wrappers bind memory to a particular subsystem
making it unavailable to the rest of the kernel.
Some BPF maps in production consume Gbytes of preallocated
memory. Top 5 in Meta: 1.5G, 1.2G, 1.1G, 300M, 200M.
Once we have kmalloc that works in any context BPF map
preallocation won't be necessary.

How:
Synchronous kmalloc/page alloc stack has multiple
stages going from fast to slow:
cmpxchg16 -> slab_alloc -> new_slab -> alloc_pages ->
rmqueue_pcplist -> __rmqueue.
rmqueue_pcplist was already relying on trylock.
This set changes rmqueue_bulk/rmqueue_buddy to attempt
a trylock and return ENOMEM if alloc_flags & ALLOC_TRYLOCK.
Then it wraps this functionality into try_alloc_pages() helper.
We make sure that the logic is sane in PREEMPT_RT.
End result: try_alloc_pages()/free_pages_nolock() are
safe to call from any context.
try_kmalloc() for any context with similar trylock
approach will follow. It will use try_alloc_pages()
when slab needs a new page.
Though such try_kmalloc/page_alloc() is an opportunistic
allocator, this design ensures that the probability of
successful allocation of small objects (up to one
page in size) is high.

Even before we have try_kmalloc(), we already use
try_alloc_pages() in BPF arena implementation and it's
going to be used more extensively in BPF.

Once the set was applied to bpf-next we ran into two
two small conflicts with MM tree as reported by Stephen:
https://lore.kernel.org/bpf/20250311120422.1d9a8f80@canb.auug.org.au/
https://lore.kernel.org/bpf/20250312145247.380c2aa5@canb.auug.org.au/

So Andrew suggested to keep thing as-is instead of moving
patchset between the trees before merge window:
https://lore.kernel.org/all/20250317132710.fbcde1c8bb66f91f36e78c89@linux-foundation.org/

Note "locking/local_lock: Introduce localtry_lock_t" patch is
later used in Vlastimil's sheaves and in Shakeel's changes.

Signed-off-by: Alexei Starovoitov <ast@...nel.org>
----------------------------------------------------------------
Alexei Starovoitov (6):
      mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation
      mm, bpf: Introduce free_pages_nolock()
      memcg: Use trylock to access memcg stock_lock.
      mm, bpf: Use memcg in try_alloc_pages().
      bpf: Use try_alloc_pages() to allocate pages for bpf needs.
      Merge branch 'bpf-mm-introduce-try_alloc_pages'

Sebastian Andrzej Siewior (1):
      locking/local_lock: Introduce localtry_lock_t

Vlastimil Babka (1):
      mm: Fix the flipped condition in gfpflags_allow_spinning()

 include/linux/bpf.h                 |   2 +-
 include/linux/gfp.h                 |  23 ++++
 include/linux/local_lock.h          |  70 +++++++++++++
 include/linux/local_lock_internal.h | 146 ++++++++++++++++++++++++++
 include/linux/mm_types.h            |   4 +
 include/linux/mmzone.h              |   3 +
 kernel/bpf/arena.c                  |   5 +-
 kernel/bpf/syscall.c                |  23 +++-
 lib/stackdepot.c                    |  10 +-
 mm/internal.h                       |   1 +
 mm/memcontrol.c                     |  53 +++++++---
 mm/page_alloc.c                     | 203 +++++++++++++++++++++++++++++++++---
 mm/page_owner.c                     |   8 +-
 13 files changed, 509 insertions(+), 42 deletions(-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ