linux-kernel - Re: [PATCH] seqlock: Use WRITE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b7ccb5b8-4f19-4cfc-a94b-02bca7c4561f@paulmck-laptop>
Date: Fri, 20 Dec 2024 09:40:32 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Will Deacon <will@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Daniel Xu <dxu@...uu.xyz>,
	mingo@...hat.com, longman@...hat.com, boqun.feng@...il.com,
	linux-kernel@...r.kernel.org, linux-toolchains@...r.kernel.org
Subject: Re: [PATCH] seqlock: Use WRITE_ONCE() when updating sequence

On Thu, Dec 19, 2024 at 06:31:15PM +0000, Will Deacon wrote:
> On Thu, Dec 19, 2024 at 10:06:23AM -0800, Paul E. McKenney wrote:
> > On Thu, Dec 19, 2024 at 05:58:03PM +0000, Will Deacon wrote:
> > > On Thu, Dec 19, 2024 at 09:53:48AM -0800, Paul E. McKenney wrote:
> > > > On Thu, Dec 19, 2024 at 04:34:55PM +0000, Will Deacon wrote:
> > > > > On Wed, Dec 18, 2024 at 06:12:41PM +0100, Peter Zijlstra wrote:
> > > > > > diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
> > > > > > index 39cb8d0ae348..8fb7cd75fe62 100644
> > > > > > --- a/arch/s390/kernel/idle.c
> > > > > > +++ b/arch/s390/kernel/idle.c
> > > > > > @@ -45,7 +45,7 @@ void account_idle_time_irq(void)
> > > > > >  
> > > > > >  	/* Account time spent with enabled wait psw loaded as idle time. */
> > > > > >  	WRITE_ONCE(idle->idle_time, READ_ONCE(idle->idle_time) + idle_time);
> > > > > > -	WRITE_ONCE(idle->idle_count, READ_ONCE(idle->idle_count) + 1);
> > > > > > +	INC_ONC(idle->idle_count);
> > > > > 
> > > > > (nit: drive-by typo)
> > > > 
> > > > Heh!  I was clearly building with GCC.  ;-)
> > > 
> > > And not for S390 :p
> > 
> > True enough!  ;-)
> > 
> > > > > > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > > > > > index c9b58188ec61..77c253e29758 100644
> > > > > > --- a/include/linux/compiler-gcc.h
> > > > > > +++ b/include/linux/compiler-gcc.h
> > > > > > @@ -137,3 +137,8 @@
> > > > > >  #if GCC_VERSION < 90100
> > > > > >  #undef __alloc_size__
> > > > > >  #endif
> > > > > > +
> > > > > > +/*
> > > > > > + * GCC can't properly optimize the real one with volatile on.
> > > > > > + */
> > > > > > +#define INC_ONCE(v) (v)++
> > > > > 
> > > > > I think I'm missing what we're trying to do here, but how is this a
> > > > > safe fallback? I'd have thought we'd need the whole READ_ONCE() +
> > > > > WRITE_ONCE() sequence if the compiler doesn't give us what we need...
> > > > > 
> > > > > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > > > > > index efd43df3a99a..b1b13dac1b9e 100644
> > > > > > --- a/include/linux/compiler.h
> > > > > > +++ b/include/linux/compiler.h
> > > > > > @@ -8,6 +8,10 @@
> > > > > >  
> > > > > >  #ifdef __KERNEL__
> > > > > >  
> > > > > > +#ifndef INC_ONCE
> > > > > > +#define INC_ONCE(v)	(*(volatile typeof(v) *)&(v))++
> > > > > > +#endif
> > > > > 
> > > > > ... but I'm honestly not sure what this is aiming to elide. Is this about
> > > > > having the compiler emit an increment instruction with memory operands
> > > > > on architectures that support it? If so, what about architectures that
> > > > > _don't_ support that? We still need to guarantee that the load/store
> > > > > instructions are single-copy atomic for the type in that case.
> > > > 
> > > > Architectures whose increment instructions lack memory operands can still
> > > > load, increment, then store.  So yes, if two of these run concurrently,
> > > > you can lose counts when incrementing normal memory.  (Of course, with
> > > > device memory, you get what you get.)
> > > 
> > > ... but can we guarantee that those load and store instructions won't
> > > now be torn?
> > 
> > The volatile type qualifier must take care of that, or device drivers
> > no longer work.
> > 
> > Or are you worried about the C-language ++ fallback?  In that case,
> > as I understand it, this fallback is only there until GCC learns to
> > generate a increment-to-memory instruction for x86.
> 
> As I understand it, both the clang and the GCC implementations in the
> proposed diff are written using the C-language ++ operator, no?
> 
> The GCC one doesn't have 'volatile' at all:
> 
> 	#define INC_ONCE(v) (v)++
> 
> so I don't understand why it's a correct replacement for e.g.:
> 
> 	WRITE_ONCE(idle->idle_time, READ_ONCE(idle->idle_time) + idle_time);
> 
> If it was, then we wouldn't need any of these macros in the first place!

You are right, it is not a correct replacement.  It is instead a
behavior-preserving fallback until GCC does the optimizations.
Here the behavior is the code generation.

> The clang version does have 'volatile':
> 
> 	#define INC_ONCE(v)  (*(volatile typeof(v) *)&(v))++
> 
> but then I don't understand why this is enough for the compiler to
> generate an 'inc' with memory operands on x86 if it cannot do that for
> the READ_ONCE + WRITE_ONCE implementation. If somehow the above is more
> "optimisable" then I worry about the atomicity being preserved as well.
> 
> So, basically, I don't understand what's going on here :)

Peter learned that clang does in fact optimize the original READ_ONCE +
WRITE_ONCE implementation.

> I think I was expecting to see arch-specific implementations of
> INC_ONCE() written in assembly where they have the relevant instructions
> but that doesn't seem to be the case.

That is yet another fallback, though with quite a bit more code.  :-(

							Thanx, Paul