linux-kernel - Re: [PATCH bpf-next v2 09/26] rqspinlock: Protect waiters in queue from stalls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAP01T74hRYCkrqz4JKqXH7ya0ykBfX4_6611q-TO52o1TZsfjg@mail.gmail.com>
Date: Thu, 13 Feb 2025 07:20:46 +0100
From: Kumar Kartikeya Dwivedi <memxor@...il.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: bpf@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Barret Rhoden <brho@...gle.com>, Linus Torvalds <torvalds@...ux-foundation.org>, 
	Will Deacon <will@...nel.org>, Waiman Long <llong@...hat.com>, Alexei Starovoitov <ast@...nel.org>, 
	Andrii Nakryiko <andrii@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	Martin KaFai Lau <martin.lau@...nel.org>, Eduard Zingerman <eddyz87@...il.com>, 
	"Paul E. McKenney" <paulmck@...nel.org>, Tejun Heo <tj@...nel.org>, Josh Don <joshdon@...gle.com>, 
	Dohyun Kim <dohyunkim@...gle.com>, linux-arm-kernel@...ts.infradead.org, 
	kernel-team@...a.com
Subject: Re: [PATCH bpf-next v2 09/26] rqspinlock: Protect waiters in queue
 from stalls

On Mon, 10 Feb 2025 at 11:17, Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Thu, Feb 06, 2025 at 02:54:17AM -0800, Kumar Kartikeya Dwivedi wrote:
> > Implement the wait queue cleanup algorithm for rqspinlock. There are
> > three forms of waiters in the original queued spin lock algorithm. The
> > first is the waiter which acquires the pending bit and spins on the lock
> > word without forming a wait queue. The second is the head waiter that is
> > the first waiter heading the wait queue. The third form is of all the
> > non-head waiters queued behind the head, waiting to be signalled through
> > their MCS node to overtake the responsibility of the head.
> >
> > In this commit, we are concerned with the second and third kind. First,
> > we augment the waiting loop of the head of the wait queue with a
> > timeout. When this timeout happens, all waiters part of the wait queue
> > will abort their lock acquisition attempts.
>
> Why? Why terminate the whole wait-queue?
>
> I *think* I understand, but it would be good to spell out. Also, in the
> comment.

Ack. The main reason is that we eschew per-waiter timeouts with one
applied at the head of the wait queue.
This allows everyone to break out faster once we've seen the owner /
pending waiter not responding for the timeout duration from the head.
Secondly, it avoids complicated synchronization, because when not
leaving in FIFO order, prev's next pointer needs to be fixed up etc.

Let me know if this explanation differs from your understanding.