[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87y1wotayr.ffs@tglx>
Date: Wed, 20 Jul 2022 11:00:44 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Peter Zijlstra <peterz@...radead.org>,
Sami Tolvanen <samitolvanen@...gle.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
LKML <linux-kernel@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Andrew Cooper <Andrew.Cooper3@...rix.com>,
Pawan Gupta <pawan.kumar.gupta@...ux.intel.com>,
Johannes Wikner <kwikner@...z.ch>,
Alyssa Milburn <alyssa.milburn@...ux.intel.com>,
Jann Horn <jannh@...gle.com>, "H.J. Lu" <hjl.tools@...il.com>,
Joao Moreira <joao.moreira@...el.com>,
Joseph Nuzman <joseph.nuzman@...el.com>,
Steven Rostedt <rostedt@...dmis.org>,
Juergen Gross <jgross@...e.com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation
On Tue, Jul 19 2022 at 01:51, Peter Zijlstra wrote:
> On Mon, Jul 18, 2022 at 03:48:04PM -0700, Sami Tolvanen wrote:
>> On Mon, Jul 18, 2022 at 2:18 PM Peter Zijlstra <peterz@...radead.org> wrote:
>> > Ofc, we can still put the whole:
>> >
>> > sarq $5, PER_CPU_VAR(__x86_call_depth);
>> > jmp \func_direct
>> >
>> > thing in front of that.
>>
>> Sure, that would work.
>
> So if we assume \func starts with ENDBR, and further assume we've fixed
> up every direct jmp/call to land at +4, we can overwrite the ENDBR with
> part of the SARQ, that leaves us 6 more byte, placing the immediate at
> -10 if I'm not mis-counting.
>
> Now, the call sites are:
>
> 41 81 7b fa 78 56 34 12 cmpl $0x12345678, -6(%r11)
> 74 02 je 1f
> 0f 0b ud2
> e8 00 00 00 00 1: call __x86_indirect_thunk_r11
>
> That means the offset of +10 lands in the middle of the CALL
> instruction, and since we only have 16 thunks there is a limited number
> of byte patterns available there.
>
> This really isn't as nice as the -6 but might just work well enough,
> hmm?
So I added a 32byte padding and put the thunk at the start:
sarq $5, PER_CPU_VAR(__x86_call_depth);
jmp \func_direct
For sockperf that costs about 1% performance vs. the 16 byte
variant. For mitigations=off it's a ~0.5% drop.
That's on a SKL. Did not check on other systems yet.
Thanks,
tglx
Powered by blists - more mailing lists