[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4c125283-b5a4-47f2-be84-a932b50312ab@linux.dev>
Date: Wed, 4 Feb 2026 17:08:23 -0800
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Amery Hung <ameryhung@...il.com>
Cc: netdev@...r.kernel.org, alexei.starovoitov@...il.com, andrii@...nel.org,
daniel@...earbox.net, memxor@...il.com, martin.lau@...nel.org,
kpsingh@...nel.org, yonghong.song@...ux.dev, song@...nel.org,
haoluo@...gle.com, kernel-team@...a.com, bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next v5 10/16] bpf: Support lockless unlink when
freeing map or local storage
On 2/4/26 3:14 PM, Amery Hung wrote:
> On Tue, Feb 3, 2026 at 9:39 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>>
>> On 2/1/26 9:50 AM, Amery Hung wrote:
>>> +/*
>>> + * Unlink an selem from map and local storage with lockless fallback if callers
>>> + * are racing or rqspinlock returns error. It should only be called by
>>> + * bpf_local_storage_destroy() or bpf_local_storage_map_free().
>>> + */
>>> +static void bpf_selem_unlink_nofail(struct bpf_local_storage_elem *selem,
>>> + struct bpf_local_storage_map_bucket *b)
>>> +{
>>> + struct bpf_local_storage *local_storage;
>>> + struct bpf_local_storage_map *smap;
>>> + bool in_map_free = !!b;
>>> + unsigned long flags;
>>> + int err, unlink = 0;
>>> +
>>> + local_storage = rcu_dereference_check(selem->local_storage, bpf_rcu_lock_held());
>>> + smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
>>> +
>>> + /*
>>> + * Prevent being called twice from the same caller on the same selem.
>>> + * map_free() and destroy() each holds a link_cnt on an selem.
>>> + */
>>> + if ((!smap && in_map_free) || (!local_storage && !in_map_free))
>>> + return;
>>> +
>>> + if (smap) {
>>> + b = b ? : select_bucket(smap, local_storage);
>>> + err = raw_res_spin_lock_irqsave(&b->lock, flags);
>>> + if (!err) {
>>> + /*
>>> + * Call bpf_obj_free_fields() under b->lock to make sure it is done
>>> + * exactly once for an selem. Safe to free special fields immediately
>>> + * as no BPF program should be referencing the selem.
>>> + */
>>> + if (likely(selem_linked_to_map(selem))) {
>>> + hlist_del_init_rcu(&selem->map_node);
>>> + bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
>>> + unlink++;
>>> + }
>>> + raw_res_spin_unlock_irqrestore(&b->lock, flags);
>>> + }
>>> + /*
>>> + * Highly unlikely scenario: resource leak
>>> + *
>>> + * When map_free(selem1), destroy(selem1) and destroy(selem2) are racing
>>> + * and both selem belong to the same bucket, if destroy(selem2) acquired
>>> + * b->lock and block for too long, neither map_free(selem1) and
>>> + * destroy(selem1) will be able to free the special field associated
>>> + * with selem1 as raw_res_spin_lock_irqsave() returns -ETIMEDOUT.
>>> + */
>>> + WARN_ON_ONCE(err && in_map_free);
>>> + if (!err || in_map_free)
>>> + RCU_INIT_POINTER(SDATA(selem)->smap, NULL);
>>> + }
>>> +
>>> + if (local_storage) {
>>> + err = raw_res_spin_lock_irqsave(&local_storage->lock, flags);
>>> + if (!err) {
>>> + /*
>>> + * Normally, map_free() can call mem_uncharge() if destroy() is
>>> + * not about to return to the owner, which can then go away
>>> + * immediately. Otherwise, the charge of the selem will stay
>>> + * accounted in local_storage->selems_size and uncharged during
>>> + * destroy().
>>> + */
>>> + if (likely(selem_linked_to_storage(selem))) {
>>> + hlist_del_init_rcu(&selem->snode);
>>> + if (smap && in_map_free &&
>>
>> I think the smap non-null check is not needed.
>
> While smap is still valid in map_free(), SDATA(selem)->smap could have
> been init to NULL, and then mem_uncharge() will dereference a null
> pointer.
hmm... there is a "if ((!smap && in_map_free) || ...)) return;" at the
beginning of the function, but may be the next revision will need this
check though if it does not depend on "!smap" to decide the second visit.
>>
>>
Powered by blists - more mailing lists