netdev - Re: [PATCH bpf-next v3 01/16] bpf: Convert bpf_selem_unlink

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMB2axP5OvZKhHDnW9UD95S+2nTYaR4xLRHdg+oeXtpRJOfKrA@mail.gmail.com>
Date: Fri, 9 Jan 2026 10:39:32 -0800
From: Amery Hung <ameryhung@...il.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org, alexei.starovoitov@...il.com, 
	andrii@...nel.org, daniel@...earbox.net, memxor@...il.com, 
	martin.lau@...nel.org, kpsingh@...nel.org, yonghong.song@...ux.dev, 
	song@...nel.org, haoluo@...gle.com, kernel-team@...a.com
Subject: Re: [PATCH bpf-next v3 01/16] bpf: Convert bpf_selem_unlink_map to failable

On Thu, Jan 8, 2026 at 12:29 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>
> On 12/18/25 9:56 AM, Amery Hung wrote:
> > To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock,
> > convert bpf_selem_unlink_map() to failable. It still always succeeds and
> > returns 0 for now.
> >
> > Since some operations updating local storage cannot fail in the middle,
> > open-code bpf_selem_unlink_map() to take the b->lock before the
> > operation. There are two such locations:
> >
> > - bpf_local_storage_alloc()
> >
> >    The first selem will be unlinked from smap if cmpxchg owner_storage_ptr
> >    fails, which should not fail. Therefore, hold b->lock when linking
> >    until allocation complete. Helpers that assume b->lock is held by
> >    callers are introduced: bpf_selem_link_map_nolock() and
> >    bpf_selem_unlink_map_nolock().
> >
> > - bpf_local_storage_update()
> >
> >    The three step update process: link_map(new_selem),
> >    link_storage(new_selem), and unlink_map(old_selem) should not fail in
> >    the middle.
> >
> > In bpf_selem_unlink(), bpf_selem_unlink_map() and
> > bpf_selem_unlink_storage() should either all succeed or fail as a whole
> > instead of failing in the middle. So, return if unlink_map() failed.
> >
> > In bpf_local_storage_destroy(), since it cannot deadlock with itself or
> > bpf_local_storage_map_free() who the function might be racing with,
> > retry if bpf_selem_unlink_map() fails due to rqspinlock returning
> > errors.
> >
> > Signed-off-by: Amery Hung <ameryhung@...il.com>
> > ---
> >   kernel/bpf/bpf_local_storage.c | 64 +++++++++++++++++++++++++++++-----
> >   1 file changed, 55 insertions(+), 9 deletions(-)
> >
> > diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c
> > index e2fe6c32822b..4e3f227fd634 100644
> > --- a/kernel/bpf/bpf_local_storage.c
> > +++ b/kernel/bpf/bpf_local_storage.c
> > @@ -347,7 +347,7 @@ void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage,
> >       hlist_add_head_rcu(&selem->snode, &local_storage->list);
> >   }
> >
> > -static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
> > +static int bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
>
> This will end up only be used by bpf_selem_unlink(). It may as well
> remove this function and open code in the bpf_selem_unlink(). I think it
> may depend on how patch 10 goes and also if it makes sense to remove
> bpf_selem_"link"_map and bpf_selem_unlink_map_nolock also, so treat it
> as a nit note for now.

Noted

>
> >   {
> >       struct bpf_local_storage_map *smap;
> >       struct bpf_local_storage_map_bucket *b;
> > @@ -355,7 +355,7 @@ static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
> >
> >       if (unlikely(!selem_linked_to_map_lockless(selem)))
>
> In the later patch where both local_storage's and map-bucket's locks
> must be acquired, will this check still be needed if there is an earlier
> check that ensures the selem is still linked to the local_storage? It
> does not matter in terms of perf, but I think it will help code reading
> in the future for the common code path (i.e. the code paths other than
> bpf_local_storage_destroy and bpf_local_storage_map_free).

Makes sense to remove it. Common code path still follow the unlink
order and do not partial unlink.

>
> >               /* selem has already be unlinked from smap */
> > -             return;
> > +             return 0;
> >
> >       smap = rcu_dereference_check(SDATA(selem)->smap, bpf_rcu_lock_held());
> >       b = select_bucket(smap, selem);
> > @@ -363,6 +363,14 @@ static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem)
> >       if (likely(selem_linked_to_map(selem)))
> >               hlist_del_init_rcu(&selem->map_node);
> >       raw_spin_unlock_irqrestore(&b->lock, flags);
> > +
> > +     return 0;
> > +}
> > +
> > +static void bpf_selem_unlink_map_nolock(struct bpf_local_storage_elem *selem)
> > +{
> > +     if (likely(selem_linked_to_map(selem)))
>
> Take this chance to remove the selem_linked_to_map() check.
> hlist_del_init_rcu has the same check.

Noted

>
> > +             hlist_del_init_rcu(&selem->map_node);
> >   }
> >
> >   void bpf_selem_link_map(struct bpf_local_storage_map *smap,
> > @@ -376,13 +384,26 @@ void bpf_selem_link_map(struct bpf_local_storage_map *smap,
> >       raw_spin_unlock_irqrestore(&b->lock, flags);
> >   }
> >
> > +static void bpf_selem_link_map_nolock(struct bpf_local_storage_map *smap,
> > +                                   struct bpf_local_storage_elem *selem,
> > +                                   struct bpf_local_storage_map_bucket *b)
> > +{
> > +     RCU_INIT_POINTER(SDATA(selem)->smap, smap);
>
> Is it needed? bpf_selem_alloc should have init the SDATA(selem)->smap.

Good catch. Forgot to remove it when rebasing. This is redundant as we
already do it in bpf_selem_alloc()

>
> > +     hlist_add_head_rcu(&selem->map_node, &b->list);
> > +}
> > +
>
> [ ... ]
>
> > @@ -574,20 +603,37 @@ bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap,
> >               goto unlock;
> >       }
> >
> > +     b = select_bucket(smap, selem);
> > +
> > +     if (old_sdata) {
> > +             old_b = select_bucket(smap, SELEM(old_sdata));
> > +             old_b = old_b == b ? NULL : old_b;
> > +     }
> > +
> > +     raw_spin_lock_irqsave(&b->lock, b_flags);
> > +
> > +     if (old_b)
> > +             raw_spin_lock_irqsave(&old_b->lock, old_b_flags);
>
> This will deadlock because of the lock ordering of b and old_b.
> Replacing it with res_spin_lock in the later patch can detect it and
> break it more gracefully. imo, we should not introduce a known deadlock
> logic in the kernel code in the syscall code path and ask the current
> user to retry the map_update_elem syscall.
>
> What happened to the patch in the earlier revision that uses the
> local_storage (or owner) for select_bucket?

Thanks for reviewing!

I decided to revert it because this introduces the dependency of selem
to local_storage when unlinking. bpf_selem_unlink_lockless() cannot
assume map or local_storage associated with a selem to be alive. In
the case where local_storage is already destroyed, we won't be able to
figure out the bucket if select_bucket() uses local_storage for
hashing.

A middle ground is to use local_storage for hashing, but save the
bucket index in selem so that local_storage pointer won't be needed
later. WDYT?

>
> [ will continue with the rest of the patches a bit later ]
>
> > +
> >       alloc_selem = NULL;
> >       /* First, link the new selem to the map */
> > -     bpf_selem_link_map(smap, selem);
> > +     bpf_selem_link_map_nolock(smap, selem, b);
> >
> >       /* Second, link (and publish) the new selem to local_storage */
> >       bpf_selem_link_storage_nolock(local_storage, selem);
> >
> >       /* Third, remove old selem, SELEM(old_sdata) */
> >       if (old_sdata) {
> > -             bpf_selem_unlink_map(SELEM(old_sdata));
> > +             bpf_selem_unlink_map_nolock(SELEM(old_sdata));
> >               bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata),
> >                                               &old_selem_free_list);
> >       }
> >
> > +     if (old_b)
> > +             raw_spin_unlock_irqrestore(&old_b->lock, old_b_flags);
> > +
> > +     raw_spin_unlock_irqrestore(&b->lock, b_flags);
> > +
> >   unlock:
> >       raw_spin_unlock_irqrestore(&local_storage->lock, flags);
> >       bpf_selem_free_list(&old_selem_free_list, false);
> > @@ -679,7 +725,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
> >               /* Always unlink from map before unlinking from
> >                * local_storage.
> >                */
> > -             bpf_selem_unlink_map(selem);
> > +             WARN_ON(bpf_selem_unlink_map(selem));
> >               /* If local_storage list has only one element, the
> >                * bpf_selem_unlink_storage_nolock() will return true.
> >                * Otherwise, it will return false. The current loop iteration
>