lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251013090313.GI4067720@noisy.programming.kicks-ass.net>
Date: Mon, 13 Oct 2025 11:03:13 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Boqun Feng <boqun.feng@...il.com>,
	David Howells <dhowells@...hat.com>, Ingo Molnar <mingo@...hat.com>,
	Li RongQing <lirongqing@...du.com>,
	Waiman Long <longman@...hat.com>, Will Deacon <will@...nel.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 1/4] seqlock: introduce scoped_seqlock_read() and
 scoped_seqlock_read_irqsave()

On Fri, Oct 10, 2025 at 03:14:39PM +0200, Oleg Nesterov wrote:
> On 10/10, Oleg Nesterov wrote:
> >
> > On 10/10, Peter Zijlstra wrote:
> > >
> > > I reordered the code, it is happier now.
> > >
> > > Anyway, the below seems to generate decent code for
> > > {-O2,-Os}x{gcc-14,clang-22}. Yay for optimizing compilers I suppose :-)
> >
> > Another approach which looks better than mine ;)
> >
> > Linus's version is simpler, but yours can handle break/return and
> > the "only lockless" case, good.
> >
> > I leave this patch to you and Linus, he seems to like your code too.
> >
> > Reviewed-by: Oleg Nesterov <oleg@...hat.com>
> >
> >
> > But... perhaps we should not "export" the _target names and instead
> > add the additional defines, something like
> >
> > 	scoped_seqlock_read()
> > 	scoped_seqlock_read_or_lock()
> > 	scoped_seqlock_read_or_lock_irqsave()
> >
> > ?
> 
> And... perhaps we can simplify this code a little bit? I mean
> 
> 	enum ss_state {
> 		ss_lockless	= 0,
> 		ss_lock		= 1,
> 		ss_lock_irqsave	= 2,
> 		ss_done		= 4,
> 	};
> 
> 	struct ss_tmp {
> 		enum ss_state	state;
> 		unsigned long	data;
> 		seqlock_t	*lock;
> 	};
> 
> 	static inline void __scoped_seqlock_cleanup(struct ss_tmp *sst)
> 	{
> 		if (sst->state & ss_lock)
> 			spin_unlock(&sst->lock.lock);
> 		if (sst->state & ss_lock_irqsave)
> 			spin_unlock_irqrestore(&sst->lock.lock, sst->data);
> 	}
> 
> 	static inline void
> 	__scoped_seqlock_next(struct ss_tmp *sst, enum ss_state target)
> 	{
> 		switch (sst->state) {
> 		case ss_lock:
> 		case ss_lock_irqsave:
> 			sst->state |= ss_done;
> 			return;
> 
> 		case ss_lockless:
> 			if (!read_seqretry(sst->lock, sst->data)) {
> 				sst->state = ss_done;
> 				return;
> 			}
> 			break;
> 		}
> 
> 		switch (target) {
> 		case ss_lock:
> 			spin_lock(&sst->lock.lock);
> 			sst->state = ss_lock;
> 			return;
> 
> 		case ss_lock_irqsave:
> 			spin_lock_irqsave(&sst->lock.lock, sst->data);
> 			sst->state = ss_lock_irqsave;
> 			return;
> 
> 		case ss_lockless:
> 			sst->data = read_seqbegin(sst->lock);
> 			return;
> 		}
> 	}
> 
> 	#define __scoped_seqlock_read(_seqlock, _target, _s)			\
> 		for (struct ss_tmp _s __cleanup(__scoped_seqlock_cleanup) =				\
> 		     { .state = ss_lockless, .data = read_seqbegin(_seqlock), .lock = __seqlock };	\
> 		     !(_s.state & ss_done);								\
> 		     __scoped_seqlock_next(&_s, _target))
> 
> 
> (I removed __scoped_seqlock_invalid_target/__scoped_seqlock_bug to lessen the code).
> 
> Not sure this makes sense. Plus I didn't even try to compile this code and I have
> no idea how this change can affect the code generation. But let me ask anyway...

So GCC is clever enough to see through this scheme, but Clang gets
confused and generates worse code. Specifically it emits the whole
__scoped_seqlock_cleanup() sequence, testing both bits and both unlock
options.

Where previously it would only have to discover which field was written
to and could delete all code for the unwritten field, it now has to
track the state and discover ss_lock|ss_done is not possible while
ss_lock_irqsave|ss_done is.

So while that additional pointer might seem wasteful, it actually makes
the state tracking easier and allows the compiler to more easily throw
away stuff.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ