lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6215f764fffc41c39c74a871124aa4ed@AcuMS.aculab.com>
Date:   Fri, 29 Oct 2021 09:33:24 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Peter Zijlstra' <peterz@...radead.org>,
        "x86@...nel.org" <x86@...nel.org>,
        "andrew.cooper3@...rix.com" <andrew.cooper3@...rix.com>,
        Josh Poimboeuf <jpoimboe@...hat.com>
CC:     "keescook@...omium.org" <keescook@...omium.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Mark Rutland <mark.rutland@....com>,
        Will Deacon <will@...nel.org>,
        "hjl.tools@...il.com" <hjl.tools@...il.com>
Subject: RE: [RFC][PATCH] x86: Add straight-line-speculation mitigation

From: Peter Zijlstra
> Sent: 28 October 2021 12:44
> 
> This little patch makes use of an upcomming GCC feature to mitigate
> straight-line-speculation for x86:
...

This all generates the instruction sequence:
	ret
	int3
because there are (apparently) times when the cpu will speculatively
execute the instruction following a 'ret'.

I suspect this is likely to have a small performance impact
on at least some cpu that has not been mentioned by anyone.
As well as the slight increase in code size I can think of
two more problems.

1) The cpu may not be able to quickly 'abort' the speculative
   execution of the 'int3' instruction.
   Since the is a slow instruction (not as slow as 'tan'!)
   this might add quite a few clocks.
   ISTR there have always been warnings about the problem
   of speculative execution of trig functions - eg if non-code
   follows a 'ret'.

2) int3 is almost certainly slow to decode.
   Plausibly this might block the decoders from decoding
   from the branch/return target.
   Although I suspect the I-cache fetch will take longer
   unless the decode time is really horrid.
   The tables I have don't give execution times for int3.

While slightly longer, it may be that 'jmp .' is actually
a better instruction than 'int3'.
Since it will block speculative execution while still
being fast to decode and (not) execute.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ