netdev - Re: [RFC PATCH bpf-next v2 06/12] bpf: Change local

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAADnVQ+=F5SkoRA4LU+JE+u87TLFp-mTS4bv+u9MUST2+CX8AA@mail.gmail.com>
Date: Mon, 6 Oct 2025 10:58:10 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Amery Hung <ameryhung@...il.com>
Cc: bpf <bpf@...r.kernel.org>, Network Development <netdev@...r.kernel.org>, 
	Andrii Nakryiko <andrii@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Kumar Kartikeya Dwivedi <memxor@...il.com>, Martin KaFai Lau <martin.lau@...nel.org>, KP Singh <kpsingh@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, Song Liu <song@...nel.org>, Hao Luo <haoluo@...gle.com>, 
	Kernel Team <kernel-team@...a.com>
Subject: Re: [RFC PATCH bpf-next v2 06/12] bpf: Change local_storage->lock and
 b->lock to rqspinlock

On Fri, Oct 3, 2025 at 3:03 PM Amery Hung <ameryhung@...il.com> wrote:
>
> On Thu, Oct 2, 2025 at 4:37 PM Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> >
> > On Thu, Oct 2, 2025 at 3:54 PM Amery Hung <ameryhung@...il.com> wrote:
> > >
> > >         bpf_selem_free_list(&old_selem_free_list, false);
> > >         if (alloc_selem) {
> > >                 mem_uncharge(smap, owner, smap->elem_size);
> > > @@ -791,7 +812,7 @@ void bpf_local_storage_destroy(struct bpf_local_storage *local_storage)
> > >          * when unlinking elem from the local_storage->list and
> > >          * the map's bucket->list.
> > >          */
> > > -       raw_spin_lock_irqsave(&local_storage->lock, flags);
> > > +       while (raw_res_spin_lock_irqsave(&local_storage->lock, flags));
> >
> > This pattern and other while(foo) doesn't make sense to me.
> > res_spin_lock will fail only on deadlock or timeout.
> > We should not spin, since retry will likely produce the same
> > result. So the above pattern just enters into infinite spin.
>
> I only spin in destroy() and map_free(), which cannot deadlock with
> itself or each other. However, IIUC, a head waiter that detects
> deadlock will cause other queued waiters to also return -DEADLOCK. I
> think they should be able to make progress with a retry.

If it's in map_free() path then why are we taking the lock at all?
There are supposed to be no active users of it.
If there are users and we actually need that lock then the deadlock
is possible and retrying will deadlock the same way.
I feel AI explained it better:
"
raw_res_spin_lock_irqsave() can return -ETIMEDOUT (after 250ms) or
-EDEADLK. Both are non-zero, so the while() loop continues. The commit
message says "it cannot deadlock with itself or
bpf_local_storage_map_free", but:

1. If -ETIMEDOUT is returned because the lock holder is taking too long,
   retrying immediately won't help. The timeout means progress isn't
   being made, and spinning in a retry loop without any backoff or
   limit will prevent other work from proceeding.

2. If -EDEADLK is returned, it means the deadlock detector found a
   cycle. Retrying immediately without any state change won't break the
   deadlock cycle.
"

> Or better if
> rqspinlock does not force queued waiters to exit the queue if it is
> deadlock not timeout.

If a deadlock is detected, it's the same issue for all waiters.
I don't see the difference between timeout and deadlock.
Both are in the "do-not-retry" category.
Both mean that there is a bug somewhere.

> >
> > If it should never fail in practice then pr_warn_once and goto out
> > leaking memory. Better yet defer to irq_work and cleanup there.
>
> Hmm, both functions are already called in some deferred callbacks.
> Even if we defer the cleanup again, they still need to grab locks and
> still might fail, no?

If it's a map destroy path and we waited for RCU GP, there shouldn't be
a need to take a lock.
The css_free_rwork_fn() -> bpf_cgrp_storage_free() path
is currently implemented like it's similar to:
bpf_cgrp_storage_delete() which needs a lock.
But bpf_cgrp_storage_free() doesn't have to.
In css_free_rwork_fn() no prog or user space
should see 'cgrp' pointer, since we're about to kfree(cgrp); it.
I could be certainly missing something.