netdev - Re: [PATCH bpf-next v5 10/16] bpf: Support lockless unlink when freeing map or local storage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMB2axPMJe6hGeyaAXvzHKyap+9uoH=q66dtCTD-C4zioJa5DA@mail.gmail.com>
Date: Wed, 4 Feb 2026 15:14:21 -0800
From: Amery Hung <ameryhung@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: netdev@...r.kernel.org, alexei.starovoitov@...il.com, andrii@...nel.org, 
	daniel@...earbox.net, memxor@...il.com, martin.lau@...nel.org, 
	kpsingh@...nel.org, yonghong.song@...ux.dev, song@...nel.org, 
	haoluo@...gle.com, kernel-team@...a.com, bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next v5 10/16] bpf: Support lockless unlink when
 freeing map or local storage

On Tue, Feb 3, 2026 at 9:39 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>
> On 2/1/26 9:50 AM, Amery Hung wrote:
> > +/*
> > + * Unlink an selem from map and local storage with lockless fallback if callers
> > + * are racing or rqspinlock returns error. It should only be called by
> > + * bpf_local_storage_destroy() or bpf_local_storage_map_free().
> > + */
> > +static void bpf_selem_unlink_nofail(struct bpf_local_storage_elem *selem,
> > +                                 struct bpf_local_storage_map_bucket *b)
> > +{
> > +     struct bpf_local_storage *local_storage;
> > +     struct bpf_local_storage_map *smap;
> > +     bool in_map_free = !!b;
> > +     unsigned long flags;
> > +     int err, unlink = 0;
> > +
> > +     local_storage = rcu_dereference_check(selem->local_storage, bpf_rcu_lock_held());
> > +     smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
> > +
> > +     /*
> > +      * Prevent being called twice from the same caller on the same selem.
> > +      * map_free() and destroy() each holds a link_cnt on an selem.
> > +      */
> > +     if ((!smap && in_map_free) || (!local_storage && !in_map_free))
>
> There is chance that map_free() can see "!smap" in the very first call
> of bpf_selem_unlink_nofail(). For example, the destroy() may grab the
> b->lock and do the hlist_del_init_rcu(&selem->map_node). In the unlikely
> case, the destroy() cannot grab the local_storage->lock, so it does
> atomic_dec_and_test(&selem->link_cnt). If map_free() hits the !smap in
> the very first time, it cannot move on to do
> atomic_dec_and_test(&selem->link_cnt), and the selem will be leaked. It
> is unlikely if we can assume destroy() should be able to hold its own
> local_storage->lock (no bpf prog should be holding it and no ETIMEDOUT).
>
> I think the same goes for the "!local_storage" check calling from destroy().
>

Will fix it by changing to use bits to track whether map_free() or
destroy() has seen this selem or not.

>
> > +             return;
> > +
> > +     if (smap) {
> > +             b = b ? : select_bucket(smap, local_storage);
> > +             err = raw_res_spin_lock_irqsave(&b->lock, flags);
> > +             if (!err) {
> > +                     /*
> > +                      * Call bpf_obj_free_fields() under b->lock to make sure it is done
> > +                      * exactly once for an selem. Safe to free special fields immediately
> > +                      * as no BPF program should be referencing the selem.
> > +                      */
> > +                     if (likely(selem_linked_to_map(selem))) {
> > +                             hlist_del_init_rcu(&selem->map_node);
> > +                             bpf_obj_free_fields(smap->map.record, SDATA(selem)->data);
> > +                             unlink++;
> > +                     }
> > +                     raw_res_spin_unlock_irqrestore(&b->lock, flags);
> > +             }
> > +             /*
> > +              * Highly unlikely scenario: resource leak
> > +              *
> > +              * When map_free(selem1), destroy(selem1) and destroy(selem2) are racing
> > +              * and both selem belong to the same bucket, if destroy(selem2) acquired
> > +              * b->lock and block for too long, neither map_free(selem1) and
> > +              * destroy(selem1) will be able to free the special field associated
> > +              * with selem1 as raw_res_spin_lock_irqsave() returns -ETIMEDOUT.
> > +              */
> > +             WARN_ON_ONCE(err && in_map_free);
> > +             if (!err || in_map_free)
> > +                     RCU_INIT_POINTER(SDATA(selem)->smap, NULL);
> > +     }
> > +
> > +     if (local_storage) {
> > +             err = raw_res_spin_lock_irqsave(&local_storage->lock, flags);
> > +             if (!err) {
> > +                     /*
> > +                      * Normally, map_free() can call mem_uncharge() if destroy() is
> > +                      * not about to return to the owner, which can then go away
> > +                      * immediately. Otherwise, the charge of the selem will stay
> > +                      * accounted in local_storage->selems_size and uncharged during
> > +                      * destroy().
> > +                      */
> > +                     if (likely(selem_linked_to_storage(selem))) {
> > +                             hlist_del_init_rcu(&selem->snode);
> > +                             if (smap && in_map_free &&
>
> I think the smap non-null check is not needed.

While smap is still valid in map_free(), SDATA(selem)->smap could have
been init to NULL, and then mem_uncharge() will dereference a null
pointer.

>
> > +                                 refcount_inc_not_zero(&local_storage->owner_refcnt)) {
> > +                                     mem_uncharge(smap, local_storage->owner, smap->elem_size);
> > +                                     local_storage->selems_size -= smap->elem_size;
> > +                                     refcount_dec(&local_storage->owner_refcnt);
> > +                             }
> > +                             unlink++;
> > +                     }
> > +                     raw_res_spin_unlock_irqrestore(&local_storage->lock, flags);
> > +             }
> > +             if (!err || !in_map_free)
> > +                     RCU_INIT_POINTER(selem->local_storage, NULL);
> > +     }
> > +
> > +     /*
> > +      * Normally, an selem can be unlinked under local_storage->lock and b->lock, and
> > +      * then freed after an RCU grace period. However, if destroy() and map_free() are
> > +      * racing or rqspinlock returns errors in unlikely situations (unlink != 2), free
> > +      * the selem only after both map_free() and destroy() drop their link_cnt.
> > +      */
> > +     if (unlink == 2 || atomic_dec_and_test(&selem->link_cnt))
> > +             bpf_selem_free(selem, false);
>
> This can be bpf_selem_free(..., true) here.

Ack.

>
>