netdev - Re: [PATCH bpf-next] bpf: lru: adjust free target to avoid global table starvation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAADnVQJ9e3Sf_kAh1LNqqeVvs7dwOC-AY_KEj5eRGGLGyC1F5A@mail.gmail.com>
Date: Wed, 18 Jun 2025 06:55:26 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Anton Protopopov <a.s.protopopov@...il.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>, bpf <bpf@...r.kernel.org>, 
	Network Development <netdev@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, John Fastabend <john.fastabend@...il.com>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH bpf-next] bpf: lru: adjust free target to avoid global
 table starvation

On Wed, Jun 18, 2025 at 6:50 AM Anton Protopopov
<a.s.protopopov@...il.com> wrote:
>
> On 25/06/16 10:38AM, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@...gle.com>
> >
> > BPF_MAP_TYPE_LRU_HASH can recycle most recent elements well before the
> > map is full, due to percpu reservations and force shrink before
> > neighbor stealing. Once a CPU is unable to borrow from the global map,
> > it will once steal one elem from a neighbor and after that each time
> > flush this one element to the global list and immediately recycle it.
> >
> > Batch value LOCAL_FREE_TARGET (128) will exhaust a 10K element map
> > with 79 CPUs. CPU 79 will observe this behavior even while its
> > neighbors hold 78 * 127 + 1 * 15 == 9921 free elements (99%).
> >
> > CPUs need not be active concurrently. The issue can appear with
> > affinity migration, e.g., irqbalance. Each CPU can reserve and then
> > hold onto its 128 elements indefinitely.
> >
> > Avoid global list exhaustion by limiting aggregate percpu caches to
> > half of map size, by adjusting LOCAL_FREE_TARGET based on cpu count.
> > This change has no effect on sufficiently large tables.
> >
> > Similar to LOCAL_NR_SCANS and lru->nr_scans, introduce a map variable
> > lru->free_target. The extra field fits in a hole in struct bpf_lru.
> > The cacheline is already warm where read in the hot path. The field is
> > only accessed with the lru lock held.
>
> Hi Willem! The patch looks very reasonable. I've bumbed into this
> issue before (see https://lore.kernel.org/bpf/ZJwy478jHkxYNVMc@zh-lab-node-5/)
> but didn't follow up, as we typically have large enough LRU maps.
>
> I've tested your patch (with a patched map_tests/map_percpu_stats.c
> selftest), works as expected for small maps. E.g., before your patch
> map of size 4096 after being updated 2176 times from 32 threads on 32
> CPUS contains around 150 elements, after your patch around (expected)
> 2100 elements.
>
> Tested-by: Anton Protopopov <a.s.protopopov@...il.com>

Looks like we have consensus.

Willem,
please target bpf tree when you respin.