[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFGoUWgo09Gfk-Dt@mini-arch>
Date: Tue, 17 Jun 2025 10:39:29 -0700
From: Stanislav Fomichev <stfomichev@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org, ast@...nel.org,
daniel@...earbox.net, john.fastabend@...il.com,
martin.lau@...ux.dev, Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH bpf-next] bpf: lru: adjust free target to avoid global
table starvation
On 06/16, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@...gle.com>
>
> BPF_MAP_TYPE_LRU_HASH can recycle most recent elements well before the
> map is full, due to percpu reservations and force shrink before
> neighbor stealing. Once a CPU is unable to borrow from the global map,
> it will once steal one elem from a neighbor and after that each time
> flush this one element to the global list and immediately recycle it.
>
> Batch value LOCAL_FREE_TARGET (128) will exhaust a 10K element map
> with 79 CPUs. CPU 79 will observe this behavior even while its
> neighbors hold 78 * 127 + 1 * 15 == 9921 free elements (99%).
>
> CPUs need not be active concurrently. The issue can appear with
> affinity migration, e.g., irqbalance. Each CPU can reserve and then
> hold onto its 128 elements indefinitely.
>
> Avoid global list exhaustion by limiting aggregate percpu caches to
> half of map size, by adjusting LOCAL_FREE_TARGET based on cpu count.
> This change has no effect on sufficiently large tables.
The code and rationale look good to me! There is also
Documentation/bpf/map_lru_hash_update.dot which mentions
LOCAL_FREE_TARGET, not sure if it's easy to convey these clamping
details in there? Or, instead, maybe expand on it in
Documentation/bpf/map_hash.rst? This <size>/<nrcpu>/2 is a heuristic,
so maybe we can give some guidance on the recommended fill level for
small (size/nrcpu < 128) maps?
Powered by blists - more mailing lists