linux-kernel - Re: [PATCH] x86/AMD: Fix ASM constraints in amd_clear

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1e3c8c24-6d94-d325-6661-ee44be073e46@citrix.com>
Date:   Thu, 10 Aug 2023 00:11:45 +0100
From:   Andrew Cooper <andrew.cooper3@...rix.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Borislav Petkov <bp@...en8.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>, x86@...nel.org
Subject: Re: [PATCH] x86/AMD: Fix ASM constraints in amd_clear_divider()

On 09/08/2023 10:33 pm, Linus Torvalds wrote:
> On Wed, 9 Aug 2023 at 13:24, Andrew Cooper <andrew.cooper3@...rix.com> wrote:
>> DIV writes its results into %eax and %edx, meaning that they need to be output
>> constraints too.  It happens to be benign in this case as the registers don't
>> change value, but the compiler should still know.
>>
>> Fixes: 77245f1c3c64 ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
> As mentioned earlier (html, not on list), I think it was intentional
> and this "fix" doesn't really fix anything.
>
> A comment might be good, of course, if this really bothers somebody.
>
> That said, if the code wanted to be *really* clever, it could have done
>
>         asm volatile(ALTERNATIVE("", "div %0", X86_BUG_DIV0)
>                      :: "a" (1), "d" (0));
>
> instead. One less register used, and the same "no change to register
> state" approach.

Yeah, I spotted that as an option, and it does save one whole zeroing
idiom...

But IMO, the risk of someone copy&pasting this as if it were a good
example, and the debugging thereafter ought to be enough of a reason to
avoid klever tricks to save 1 line of C.

> Of course, who knows what early-out algorithm the divider uses, and
> maybe it's cheaper to do 0/1 than it is to do 1/1. Not that I think we
> should care. The main reason to be cute here would be just to be cute.

AMD said "any non-faulting divide".  Which still isn't as helpful as it
could be, because according to Agner Fogh:

             Uops Latency
DIV  r8/m8    1    13-16
DIV  r16/m16  2    14-21
DIV  r32/m32  2    14-30
DIV  r64/m64  2    14-46
IDIV r8/m8    1    13-16
IDIV r16/m16  2    13-21
IDIV r32/m32  2    14-30
IDIV r64/m64  2    14-47

DIV %al looks to be the firm favourite choice.

Assuming the one extra cycle is just for the double-pumped uop, then the
best latency for a divide is 13 cycles across the board.

It doesn't make sense to optimise this as a fastpath.  After all, what
fool would put a real divide-by-1 in their code...

> That said, if you were to use this pattern in *real* code (as opposed
> to in that function that will never be called in reality because
> nobody divides by zero in real code), the register clobbers might be
> useful just to make sure the compiler doesn't re-use a zero register
> content that is then behind the latency of the dummy divide. But
> again, this particular code really doesn't _matter_ in that sense.

Well - that's a different question.

An attacker skilled in the art can easily hide #DE in the transient
shadow of something else, and plenty of people got very skilled in this
particular art trying to make better Meltdown exploits.

So I don't think putting any scrubbing in the #DE handler is going to
stop a real attack.  But I'm just speculating.

~Andrew

P.S. https://www.amd.com/system/files/documents/security-whitepaper.pdf
currently says

"The divide by zero (#DE) fault is signaled[sic] on the integer divide
instructions. No data is forwarded to younger, dependent operations for
speculative execution on this fault."

which needs to be revisited.  Zen1 was the latest-and-greatest when that
whitepaper was written.