lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 3 May 2024 16:16:21 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Waiman Long' <longman@...hat.com>, "'linux-kernel@...r.kernel.org'"
	<linux-kernel@...r.kernel.org>, "'peterz@...radead.org'"
	<peterz@...radead.org>
CC: "'mingo@...hat.com'" <mingo@...hat.com>, "'will@...nel.org'"
	<will@...nel.org>, "'boqun.feng@...il.com'" <boqun.feng@...il.com>, "'Linus
 Torvalds'" <torvalds@...ux-foundation.org>,
	"'virtualization@...ts.linux-foundation.org'"
	<virtualization@...ts.linux-foundation.org>, 'Zeng Heng'
	<zengheng4@...wei.com>
Subject: RE: [PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and
 per_cpu_ptr().

From: Waiman Long
> Sent: 03 May 2024 17:00
> To: David Laight <David.Laight@...LAB.COM>; 'linux-kernel@...r.kernel.org' <linux-
> kernel@...r.kernel.org>; 'peterz@...radead.org' <peterz@...radead.org>
> Cc: 'mingo@...hat.com' <mingo@...hat.com>; 'will@...nel.org' <will@...nel.org>; 'boqun.feng@...il.com'
> <boqun.feng@...il.com>; 'Linus Torvalds' <torvalds@...ux-foundation.org>; 'virtualization@...ts.linux-
> foundation.org' <virtualization@...ts.linux-foundation.org>; 'Zeng Heng' <zengheng4@...wei.com>
> Subject: Re: [PATCH next v2 5/5] locking/osq_lock: Optimise decode_cpu() and per_cpu_ptr().
> 
> 
> On 12/31/23 23:14, Waiman Long wrote:
> >
> > On 12/31/23 16:55, David Laight wrote:
> >> per_cpu_ptr() indexes __per_cpu_offset[] with the cpu number.
> >> This requires the cpu number be 64bit.
> >> However the value is osq_lock() comes from a 32bit xchg() and there
> >> isn't a way of telling gcc the high bits are zero (they are) so
> >> there will always be an instruction to clear the high bits.
> >>
> >> The cpu number is also offset by one (to make the initialiser 0)
> >> It seems to be impossible to get gcc to convert
> >> __per_cpu_offset[cpu_p1 - 1]
> >> into (__per_cpu_offset - 1)[cpu_p1] (transferring the offset to the
> >> address).
> >>
> >> Converting the cpu number to 32bit unsigned prior to the decrement means
> >> that gcc knows the decrement has set the high bits to zero and doesn't
> >> add a register-register move (or cltq) to zero/sign extend the value.
> >>
> >> Not massive but saves two instructions.
> >>
> >> Signed-off-by: David Laight <david.laight@...lab.com>
> >> ---
> >>   kernel/locking/osq_lock.c | 6 ++----
> >>   1 file changed, 2 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> >> index 35bb99e96697..37a4fa872989 100644
> >> --- a/kernel/locking/osq_lock.c
> >> +++ b/kernel/locking/osq_lock.c
> >> @@ -29,11 +29,9 @@ static inline int encode_cpu(int cpu_nr)
> >>       return cpu_nr + 1;
> >>   }
> >>   -static inline struct optimistic_spin_node *decode_cpu(int
> >> encoded_cpu_val)
> >> +static inline struct optimistic_spin_node *decode_cpu(unsigned int
> >> encoded_cpu_val)
> >>   {
> >> -    int cpu_nr = encoded_cpu_val - 1;
> >> -
> >> -    return per_cpu_ptr(&osq_node, cpu_nr);
> >> +    return per_cpu_ptr(&osq_node, encoded_cpu_val - 1);
> >>   }
> >>     /*
> >
> > You really like micro-optimization.
> >
> > Anyway,
> >
> > Reviewed-by: Waiman Long <longman@...hat.com>
> >
> David,
> 
> Could you respin the series based on the latest upstream code?

Looks like a wet bank holiday weekend.....

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ