[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251009232008.GM1386988@noisy.programming.kicks-ass.net>
Date: Fri, 10 Oct 2025 01:20:08 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Oleg Nesterov <oleg@...hat.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
Boqun Feng <boqun.feng@...il.com>,
David Howells <dhowells@...hat.com>, Ingo Molnar <mingo@...hat.com>,
Li RongQing <lirongqing@...du.com>,
Waiman Long <longman@...hat.com>, Will Deacon <will@...nel.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and
scoped_seqlock_read_irqsave()
On Fri, Oct 10, 2025 at 12:12:42AM +0200, Peter Zijlstra wrote:
> On Thu, Oct 09, 2025 at 01:24:51PM -0700, Linus Torvalds wrote:
> > On Thu, 9 Oct 2025 at 13:12, Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > Slightly nicer version that's actually compiled :-)
> >
> > I assume that "target of ss_lockless" is an intentional extension to
> > just make the loop never take a lock at all?
>
> Yep. Almost came for free, so might as well do it.
>
> > I do like it, but I think you'll find that having a separate 'seq' and
> > 'flags' like this:
> >
> > > +struct ss_tmp {
> > > + enum ss_state state;
> > > + int seq;
> > > + unsigned long flags;
> > > + spinlock_t *lock;
> > > +};
> >
> > makes it unnecessarily waste a register.
> >
> > You never need both seq and flags at the same time, since if you take
> > the spinlock the sequence number is pointless.
> >
> > So please make that a union, and I think it will help avoid wasting a
> > register in the loop.
>
> Sure; otoh compiler should be able to tell the same using liveness
> analysis I suppose, but perhaps they're not *that* clever.
The moment I use a union gcc-14 creates the whole structure on the
stack, while without the union it mostly manages to keep the things
in registers without too much spilling.
Also, there's a rather silly bug in the code, the __cleanup function
looks at ->state to determine which spin_unlock variant to pick, but
->state will be ss_done. Additionally, when someone does 'break' or
'goto' out of the scope the state isn't reliably anyway.
The sanest code gen happens when I do:
struct ss_tmp {
enum ss_state state;
int seq;
unsigned long flags;
spinlock_t *lock;
spinlock_t *lock_irqsave;
}
static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst)
{
if (sst->lock)
spin_unlock(sst->lock);
if (sst->lock_irqsave)
spin_unlock_irqrestore(sst->lock, sst->flags);
}
liveness analysis seems to work well, and it is able to determine which
fields are active and only track those in registers and utterly discard
the rest. Specifically in the thread_group_cputime code, there are only
_raw_spin_lock_irqsave and _raw_spin_unlock_irqrestore calls left, all
the other stuff is just optimized out.
I'll look what clang does ... but that's really tomorrow, or rather,
somewhat later this morning :/
Powered by blists - more mailing lists