lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 31 Dec 2023 11:56:05 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Linus Torvalds' <torvalds@...ux-foundation.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"peterz@...radead.org" <peterz@...radead.org>, "longman@...hat.com"
	<longman@...hat.com>, "mingo@...hat.com" <mingo@...hat.com>,
	"will@...nel.org" <will@...nel.org>, "boqun.feng@...il.com"
	<boqun.feng@...il.com>, "xinhui.pan@...ux.vnet.ibm.com"
	<xinhui.pan@...ux.vnet.ibm.com>, "virtualization@...ts.linux-foundation.org"
	<virtualization@...ts.linux-foundation.org>, Zeng Heng <zengheng4@...wei.com>
Subject: RE: [PATCH next 4/5] locking/osq_lock: Optimise per-cpu data
 accesses.

From: Linus Torvalds
> Sent: 30 December 2023 20:59
> 
> On Sat, 30 Dec 2023 at 12:41, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > UNTESTED patch to just do the "this_cpu_write()" parts attached.
> > Again, note how we do end up doing that this_cpu_ptr conversion later
> > anyway, but at least it's off the critical path.
> 
> Also note that while 'this_cpu_ptr()' doesn't exactly generate lovely
> code, it really is still better than caching a value in memory.
> 
> At least the memory location that 'this_cpu_ptr()' accesses is
> slightly more likely to be hot (and is right next to the cpu number,
> iirc).

I was only going to access the 'self' field in code that required
the 'node' cache line be present.

> 
> That said, I think we should fix this_cpu_ptr() to not ever generate
> that disgusting cltq just because the cpu pointer has the wrong
> signedness. I don't quite know how to do it, but this:
> 
>   -#define per_cpu_offset(x) (__per_cpu_offset[x])
>   +#define per_cpu_offset(x) (__per_cpu_offset[(unsigned)(x)])
> 
> at least helps a *bit*. It gets rid of the cltq, at least, but if
> somebody actually passes in an 'unsigned long' cpuid, it would cause
> an unnecessary truncation.

Doing the conversion using arithmetic might help, so:
		__per_cpu_offset[(x) + 0u]

> And gcc still generates
> 
>         subl    $1, %eax        #, cpu_nr
>         addq    __per_cpu_offset(,%rax,8), %rcx
> 
> instead of just doing
> 
>         addq    __per_cpu_offset-8(,%rax,8), %rcx
> 
> because it still needs to clear the upper 32 bits and doesn't know
> that the 'xchg()' already did that.

Not only that, you need to do the 'subl' after converting to 64 bits.
Otherwise the wrong location is read were cpu_nr to be zero.
I've tried that - but it still failed.

> Oh well. I guess even without the -1/+1 games by the OSQ code, we
> would still end up with a "movl" just to do that upper bits clearing
> that the compiler doesn't know is unnecessary.
> 
> I don't think we have any reasonable way to tell the compiler that the
> register output of our xchg() inline asm has the upper 32 bits clear.

It could be done for a 32bit unsigned xchg() - just make the return
type unsigned 64bit.
But that won't work for the signed exchange - and 'atomic_t' is signed.
OTOH I'd guess this code could use 'unsigned int' instead of atomic_t?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ