linux-kernel - Re: [PATCH] seqlock: Use WRITE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241219183115.GA25652@willie-the-truck>
Date: Thu, 19 Dec 2024 18:31:15 +0000
From: Will Deacon <will@...nel.org>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Daniel Xu <dxu@...uu.xyz>,
	mingo@...hat.com, longman@...hat.com, boqun.feng@...il.com,
	linux-kernel@...r.kernel.org, linux-toolchains@...r.kernel.org
Subject: Re: [PATCH] seqlock: Use WRITE_ONCE() when updating sequence

On Thu, Dec 19, 2024 at 10:06:23AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 19, 2024 at 05:58:03PM +0000, Will Deacon wrote:
> > On Thu, Dec 19, 2024 at 09:53:48AM -0800, Paul E. McKenney wrote:
> > > On Thu, Dec 19, 2024 at 04:34:55PM +0000, Will Deacon wrote:
> > > > On Wed, Dec 18, 2024 at 06:12:41PM +0100, Peter Zijlstra wrote:
> > > > > diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
> > > > > index 39cb8d0ae348..8fb7cd75fe62 100644
> > > > > --- a/arch/s390/kernel/idle.c
> > > > > +++ b/arch/s390/kernel/idle.c
> > > > > @@ -45,7 +45,7 @@ void account_idle_time_irq(void)
> > > > >  
> > > > >  	/* Account time spent with enabled wait psw loaded as idle time. */
> > > > >  	WRITE_ONCE(idle->idle_time, READ_ONCE(idle->idle_time) + idle_time);
> > > > > -	WRITE_ONCE(idle->idle_count, READ_ONCE(idle->idle_count) + 1);
> > > > > +	INC_ONC(idle->idle_count);
> > > > 
> > > > (nit: drive-by typo)
> > > 
> > > Heh!  I was clearly building with GCC.  ;-)
> > 
> > And not for S390 :p
> 
> True enough!  ;-)
> 
> > > > > diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
> > > > > index c9b58188ec61..77c253e29758 100644
> > > > > --- a/include/linux/compiler-gcc.h
> > > > > +++ b/include/linux/compiler-gcc.h
> > > > > @@ -137,3 +137,8 @@
> > > > >  #if GCC_VERSION < 90100
> > > > >  #undef __alloc_size__
> > > > >  #endif
> > > > > +
> > > > > +/*
> > > > > + * GCC can't properly optimize the real one with volatile on.
> > > > > + */
> > > > > +#define INC_ONCE(v) (v)++
> > > > 
> > > > I think I'm missing what we're trying to do here, but how is this a
> > > > safe fallback? I'd have thought we'd need the whole READ_ONCE() +
> > > > WRITE_ONCE() sequence if the compiler doesn't give us what we need...
> > > > 
> > > > > diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> > > > > index efd43df3a99a..b1b13dac1b9e 100644
> > > > > --- a/include/linux/compiler.h
> > > > > +++ b/include/linux/compiler.h
> > > > > @@ -8,6 +8,10 @@
> > > > >  
> > > > >  #ifdef __KERNEL__
> > > > >  
> > > > > +#ifndef INC_ONCE
> > > > > +#define INC_ONCE(v)	(*(volatile typeof(v) *)&(v))++
> > > > > +#endif
> > > > 
> > > > ... but I'm honestly not sure what this is aiming to elide. Is this about
> > > > having the compiler emit an increment instruction with memory operands
> > > > on architectures that support it? If so, what about architectures that
> > > > _don't_ support that? We still need to guarantee that the load/store
> > > > instructions are single-copy atomic for the type in that case.
> > > 
> > > Architectures whose increment instructions lack memory operands can still
> > > load, increment, then store.  So yes, if two of these run concurrently,
> > > you can lose counts when incrementing normal memory.  (Of course, with
> > > device memory, you get what you get.)
> > 
> > ... but can we guarantee that those load and store instructions won't
> > now be torn?
> 
> The volatile type qualifier must take care of that, or device drivers
> no longer work.
> 
> Or are you worried about the C-language ++ fallback?  In that case,
> as I understand it, this fallback is only there until GCC learns to
> generate a increment-to-memory instruction for x86.

As I understand it, both the clang and the GCC implementations in the
proposed diff are written using the C-language ++ operator, no?

The GCC one doesn't have 'volatile' at all:

	#define INC_ONCE(v) (v)++

so I don't understand why it's a correct replacement for e.g.:

	WRITE_ONCE(idle->idle_time, READ_ONCE(idle->idle_time) + idle_time);

If it was, then we wouldn't need any of these macros in the first place!

The clang version does have 'volatile':

	#define INC_ONCE(v)  (*(volatile typeof(v) *)&(v))++

but then I don't understand why this is enough for the compiler to
generate an 'inc' with memory operands on x86 if it cannot do that for
the READ_ONCE + WRITE_ONCE implementation. If somehow the above is more
"optimisable" then I worry about the atomicity being preserved as well.

So, basically, I don't understand what's going on here :)

I think I was expecting to see arch-specific implementations of
INC_ONCE() written in assembly where they have the relevant instructions
but that doesn't seem to be the case.

Will