[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e3c8c24-6d94-d325-6661-ee44be073e46@citrix.com>
Date:   Thu, 10 Aug 2023 00:11:45 +0100
From:   Andrew Cooper <andrew.cooper3@...rix.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Borislav Petkov <bp@...en8.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>, x86@...nel.org
Subject: Re: [PATCH] x86/AMD: Fix ASM constraints in amd_clear_divider()
On 09/08/2023 10:33 pm, Linus Torvalds wrote:
> On Wed, 9 Aug 2023 at 13:24, Andrew Cooper <andrew.cooper3@...rix.com> wrote:
>> DIV writes its results into %eax and %edx, meaning that they need to be output
>> constraints too.  It happens to be benign in this case as the registers don't
>> change value, but the compiler should still know.
>>
>> Fixes: 77245f1c3c64 ("x86/CPU/AMD: Do not leak quotient data after a division by 0")
> As mentioned earlier (html, not on list), I think it was intentional
> and this "fix" doesn't really fix anything.
>
> A comment might be good, of course, if this really bothers somebody.
>
> That said, if the code wanted to be *really* clever, it could have done
>
>         asm volatile(ALTERNATIVE("", "div %0", X86_BUG_DIV0)
>                      :: "a" (1), "d" (0));
>
> instead. One less register used, and the same "no change to register
> state" approach.
Yeah, I spotted that as an option, and it does save one whole zeroing
idiom...
But IMO, the risk of someone copy&pasting this as if it were a good
example, and the debugging thereafter ought to be enough of a reason to
avoid klever tricks to save 1 line of C.
> Of course, who knows what early-out algorithm the divider uses, and
> maybe it's cheaper to do 0/1 than it is to do 1/1. Not that I think we
> should care. The main reason to be cute here would be just to be cute.
AMD said "any non-faulting divide".  Which still isn't as helpful as it
could be, because according to Agner Fogh:
             Uops Latency
DIV  r8/m8    1    13-16
DIV  r16/m16  2    14-21
DIV  r32/m32  2    14-30
DIV  r64/m64  2    14-46
IDIV r8/m8    1    13-16
IDIV r16/m16  2    13-21
IDIV r32/m32  2    14-30
IDIV r64/m64  2    14-47
DIV %al looks to be the firm favourite choice.
Assuming the one extra cycle is just for the double-pumped uop, then the
best latency for a divide is 13 cycles across the board.
It doesn't make sense to optimise this as a fastpath.  After all, what
fool would put a real divide-by-1 in their code...
> That said, if you were to use this pattern in *real* code (as opposed
> to in that function that will never be called in reality because
> nobody divides by zero in real code), the register clobbers might be
> useful just to make sure the compiler doesn't re-use a zero register
> content that is then behind the latency of the dummy divide. But
> again, this particular code really doesn't _matter_ in that sense.
Well - that's a different question.
An attacker skilled in the art can easily hide #DE in the transient
shadow of something else, and plenty of people got very skilled in this
particular art trying to make better Meltdown exploits.
So I don't think putting any scrubbing in the #DE handler is going to
stop a real attack.  But I'm just speculating.
~Andrew
P.S. https://www.amd.com/system/files/documents/security-whitepaper.pdf
currently says
"The divide by zero (#DE) fault is signaled[sic] on the integer divide
instructions. No data is forwarded to younger, dependent operations for
speculative execution on this fault."
which needs to be revisited.  Zen1 was the latest-and-greatest when that
whitepaper was written.
Powered by blists - more mailing lists
 
