[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9df6c99-9fc7-f47d-5486-5787503177b5@suse.com>
Date: Wed, 23 Aug 2023 09:08:13 +0300
From: Nikolay Borisov <nik.borisov@...e.com>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: Andrew Cooper <andrew.cooper3@...rix.com>,
LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
Borislav Petkov <bp@...en8.de>,
Peter Zijlstra <peterz@...radead.org>,
Babu Moger <babu.moger@....com>, David.Kaplan@....com,
gregkh@...uxfoundation.org, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH RFC 4/4] x86/srso: Use CALL-based return thunks to reduce
overhead
On 23.08.23 г. 1:18 ч., Josh Poimboeuf wrote:
> On Tue, Aug 22, 2023 at 09:45:07AM +0300, Nikolay Borisov wrote:
>>
>>
>> On 22.08.23 г. 5:22 ч., Josh Poimboeuf wrote:
>>> On Tue, Aug 22, 2023 at 12:01:29AM +0100, Andrew Cooper wrote:
>>>> On 21/08/2023 4:16 pm, Josh Poimboeuf wrote:
>>>>> On Mon, Aug 21, 2023 at 12:27:23PM +0100, Andrew Cooper wrote:
>>>>>> The SRSO safety depends on having a CALL to an {ADD,LEA}/RET sequence which
>>>>>> has been made safe in the BTB. Specifically, there needs to be no pertubance
>>>>>> to the RAS between a correctly predicted CALL and the subsequent RET.
>>>>>>
>>>>>> Use the new infrastructure to CALL to a return thunk. Remove
>>>>>> srso_fam1?_safe_ret() symbols and point srso_fam1?_return_thunk().
>>>>>>
>>>>>> This removes one taken branch from every function return, which will reduce
>>>>>> the overhead of the mitigation. It also removes one of three moving pieces
>>>>>> from the SRSO mess.
>>>>> So, the address of whatever instruction comes after the 'CALL
>>>>> srso_*_return_thunk' is added to the RSB/RAS, and that might be
>>>>> speculated to when the thunk returns. Is that a concern?
>>>>
>>>> That is very intentional, and key to the safety.
>>>>
>>>> Replacing a RET with a CALL/{ADD,LEA}/RET sequence is a form of
>>>> retpoline thunk. The only difference with regular retpolines is that
>>>> the intended target is already on the stack, and not in a GPR.
>>>>
>>>>
>>>> If the CALL mispredicts, it doesn't matter. When decode catches up
>>>> (allegedly either instantaneously on Fam19h, or a few cycles late on
>>>> Fam17h), the top of the RAS is corrected will point at the INT3
>>>> following the CALL instruction.
>>>
>>> That's the thing though, at least with my kernel/compiler combo there's
>>> no INT3 after the JMP __x86_return_thunk, and there's no room to patch
>>> one in after the CALL, as the JMP and CALL are both 5 bytes.
>>
>> FWIW gcc's mfunction-return=thunk-return only ever generates a jmp,
>> thunk/thunk-inline OTOH generates a "full fledged" thunk with all the
>> necessary speculation catching tricks.
>>
>> For reference:
>>
>> https://godbolt.org/z/M1avYc63b
>
> The problem is the call-site, not the thunk. Ideally we'd have an
> option which adds an INT3 after the 'JMP __x86_return_thunk'.
The way I see it, it seems the int3/ud2 or w/e sequence belongs to the
thunk and not the call site (what you said). However, Andrew's solution
depends on the callsite sort of being the thunk.
It seems something like that has already been done for the indirect
thunk but not for return thunk:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952
>
Powered by blists - more mailing lists