lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 17 Apr 2024 21:21:29 +0200
From: Uros Bizjak <ubizjak@...il.com>
To: Borislav Petkov <bp@...en8.de>
Cc: linux-kernel@...r.kernel.org, linux-tip-commits@...r.kernel.org, 
	Ingo Molnar <mingo@...nel.org>, Linus Torvalds <torvalds@...ux-foundation.org>, 
	Waiman Long <longman@...hat.com>, x86@...nel.org
Subject: Re: [tip: locking/core] locking/atomic/x86: Introduce arch_try_cmpxchg64_local()

On Wed, Apr 17, 2024 at 8:41 PM Borislav Petkov <bp@...en8.de> wrote:

> > If the line count is the problem, I can easily parametrize new and
> > existing big macro descriptions in a follow-up patch. However, I was
> > advised to not mix everything together in one patch, but rest assured,
> > the creation and testing of the follow-up patch would take me less
> > time than writing the message you are reading.
>
> I'm simply making sure we're not going off the rails with
> micro-optimizing for no apparent reason.
>
> Saving a
>
>         test   %rax,%rax
>
> doesn't need fixing in my book. Because I don't think you'll be able to
> even measure it.

The above is perhaps a little unfortunate example taken from

if (cmpxchg64(...))

where the check is against zero. The compiler can optimize the check
to a TEST insn in this particular case, but otherwise CMP will be
emitted for different usages. Not a big difference, but a register has
to be kept live across cmpxchg8b.

> > It brings no future maintenance burden, but it perhaps improves
> > someone's life a tiny bit.
>
> This is where you and I disagree: touching that alternative in
> __arch_try_cmpxchg64_emu_local() does as we tend to change them from
> time to time, especially in recent times.
>
> And I wouldn't mind touching it but if it is there to save 10 insns on
> 32-bit - which doesn't matter - then why bother?
>
> Or do you have a relevant 32-bit workload which brings any improvement
> by this change?

There is one important issue. When a register (or two for double-word
values) has to be kept live for a compare, the register pressure on
32bit targets around cmpxchg8b goes through the roof, and when using
the frame pointer (and maybe some fixed register, e.g. PIC), the
register allocator runs out of available registers. The number of
spills around cmpxchg8b signals the troubles register allocator goes
through to "fix" everything, so from the compiler PoV any relief is
more than welcome here. Even in GCC internal libraries, we have had to
take a special approach with this insn to avoid internal compiler
errors. The kernel was quite lucky here ;)

Thanks and best regards,
Uros.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ