[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wiHbN+_LCmSj2sHswDRJ0yG3kkjptMvCXcMwk7jWK1F=Q@mail.gmail.com>
Date: Thu, 9 Oct 2025 00:04:37 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Alexander Viro <viro@...iv.linux.org.uk>, Boqun Feng <boqun.feng@...il.com>,
David Howells <dhowells@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Li RongQing <lirongqing@...du.com>, Peter Zijlstra <peterz@...radead.org>,
Waiman Long <longman@...hat.com>, Will Deacon <will@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and scoped_seqlock_read_irqsave()
On Wed, 8 Oct 2025 at 22:31, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> But that hackery shouldn't even exist since it's all handled naturally
> and much more cleanly by the surrounding for-loop.
I'm batting zero. I still think it's true that this logic should be
handled by the for-loop, but the *seq=1" hackery did actually do
something.
Because it, together with the 'lockless' flag, enumerated three
states, not two. It's just that the first state didn't _do_ anything,
so it was kind of easy to miss it when I read through the code.
So the three states are
(a) before anything has been done (lockless, seq is even)
(b) the "we've done the lockless, we need to end the loop or take the
lock" (!lockless, seq is even)
(c) we've done the locked, we need to unlock and end (seq is odd)
and so relying on just a 'bool lockless' in the loop isn't enough.
I do still think this could be done at the for-loop level, so that the
compiler can statically see the stages and unroll it all to not be a
loop at all.
But I think it means that "lockless" would have to be a "phase", and go 0/1/2.
And I did get it to work that way, and did get gcc to unroll it to
generate really nice code. IOW, I can make this code
scoped_seqlock_read(lock) {
asm volatile("TEST");
}
generate this assembly:
.L12:
movl (%rdi), %eax #* lock, _25
testb $1, %al #, _25
jne .L14 #,
TEST
movl (%rdi), %edx # MEM[(const volatile unsigned int
*)lock_6(D)], _70
cmpl %edx, %eax # _70, _25
jne .L15 #,
ret
.L15:
subq $8, %rsp #,
addq $4, %rdi #, _74
movq %rdi, (%rsp) # _74, %sfp
call _raw_spin_lock #
TEST
movq (%rsp), %rdi # %sfp, _74
addq $8, %rsp #,
jmp _raw_spin_unlock #
.L14:
pause
jmp .L12 #
which is basically optimal, but I have to say, the hoops I had to jump
through to get there makes me suspect it's not worth it.
(Note above how there is no sign of 'phase' left in the code, and the
only conditionals are on the actual sequence count state, and the nice
default action of no sequence count problems is all linear
fall-through code).
The irqsave version looks the same, just (obviously) calling a
different set of spinlock/unlock functions and having one extra
register argument for that.
But to get there, I had to use not only linux/unroll.h, but also make
that for-loop explicitly have that s.phase < 3 test just to get the
unrolling to actually trigger.
I'm attaching the test-case I wrote that does this. It's not
*horrific*, but the extra unrolling hackery makes me doubt that it's
really worth it.
So I suspect your patch is fine as-is, and I was not just wrong on
relying on the (insufficient) "locked" boolean, but the more complete
solution is just too damn ugly.
But hey, I *did* get gcc to generate nice code. So you might look at
my test-case below and decide that maybe there's something to be said
for this.
Linus
View attachment "t.c" of type "text/x-csrc" (1357 bytes)
Powered by blists - more mailing lists