lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4YR-VkAOKiS5yxSUYi0YMzY1p=pkYe4dOkgFs+A=9AFFA@mail.gmail.com>
Date: Sun, 23 Jun 2024 19:43:31 +0200
From: Uros Bizjak <ubizjak@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: kernel test robot <lkp@...el.com>, oe-kbuild-all@...ts.linux.dev, 
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>, 
	Borislav Petkov <bp@...en8.de>, Peter Zijlstra <peterz@...radead.org>
Subject: Re: arch/x86/include/asm/cmpxchg_32.h:149:9: error: inline assembly
 requires more registers than available

On Sun, Jun 23, 2024 at 5:42 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Sun, 23 Jun 2024 at 06:14, Uros Bizjak <ubizjak@...il.com> wrote:
> >
> > The referred patch actually mitigates the issue, as explained in
> > details here [1].
>
> Clearly not.
>
> It might help things in theory, but in practice it clearly just makes
> things worse, and causes problems for clang.
>
> And the thing about practice is that it always trumps theory.
>
> > So, without some interest from x86 maintainers, the issue will remain
> > unfixed. If they want x86_32 arch to die then inline locking
> > primitives involving cmpxchg8b are one of the nails in its coffin.
>
> What? No.
>
> That commit will get reverted. It caused problems, and now you are
> blaming others.
>
> You can't just go "I caused issues, but the x86 maintainers aren't
> interested in this area, so it's their fault".

I disagree with the above. There is no one to blame, let alone blaming others.

The cmpxchg8b is a complex beast, It needs 4 input registers (a pair
of them inout) and optionally 2 address registers. When the frame
pointer is active, the instruction can consume all available registers
on x86_32, so the compiler has to work extremely hard to satisfy the
required insn constraints (and also pass live values over the
instruction). The two examples I quoted in my previous mail just
illustrate this.

> And no, "improved code generation with one compiler version" does not
> then mean "other compilers don't matter".

Using try_cmpxchg *improves* the situation, as there is a double-word
value that does not have to be live over the insn. So, leaving the
situation as it is will just trigger the issue later. We had exactly
these problems in GCC with __atomic builtins, and GCC learned how to
handle this extreme register pressure.

> Now, I do suspect that the issue is some very random thing that could
> be fixed by massaging the inline asm a bit.
>
> For example, why does that 32-bit __arch_try_cmpxchg64() do this:
>
>         if (unlikely(!ret))                                             \
>                 *(_oldp) = o.full;                                      \
>
> when I think it would be simpler and more straightforward to just do
> that *(_oldp) = o.full unconditionally? The "dereference" is purely
> syntactic, the intent is that the compiler will just use a register
> for this all (well, two registers on 32-bit), and making it
> conditional makes things worse, because now that "oldp" register
> cannot be the same as "o" itself.

The  primitive is written in this way because we would like to use it
in a cmpxchg loop, as well as as a condition in an if clause. Without
the above referred condition, the compiler will emit a dead store in
the later case. I have tested this macro extensively, and the above is
really the best we can do, it results in a tight cmpxchg loop and
without unneeded moves/stores in code without loops.

> I didn't look very closely into this, but maybe that would just fix
> code generation. And maybe it wouldn't. If there's no other fix
> somebody comes up with, we just have to revert.

As said in my previous message, if the compiler can't handle
__try_cmpxchg implementation, the most straightforward and reliable
solution is to implement atomic64_{and,or,xor} as outline functions,

Uros.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ