[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5def7e28-3949-9685-7ddf-19b550847ef0@zytor.com>
Date: Fri, 6 Oct 2023 11:59:22 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Ingo Molnar <mingo@...nel.org>, Brian Gerst <brgerst@...il.com>
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
Andy Lutomirski <luto@...nel.org>,
Mika Penttilä <mpenttil@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Uros Bizjak <ubizjak@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH v2 0/6] x86: Clean up fast syscall return validation
On 10/5/23 13:20, Ingo Molnar wrote:
>
> * Brian Gerst <brgerst@...il.com> wrote:
>
>> Looking at the compiled output, the only suboptimal code appears to be
>> the canonical address test, where the C code uses the CL register for
>> the shifts instead of immediates.
>>
>> 180: e9 00 00 00 00 jmp 185 <do_syscall_64+0x85>
>> 181: R_X86_64_PC32 .altinstr_aux-0x4
>> 185: b9 07 00 00 00 mov $0x7,%ecx
>> 18a: eb 05 jmp 191 <do_syscall_64+0x91>
>> 18c: b9 10 00 00 00 mov $0x10,%ecx
>> 191: 48 89 c2 mov %rax,%rdx
>> 194: 48 d3 e2 shl %cl,%rdx
>> 197: 48 d3 fa sar %cl,%rdx
>> 19a: 48 39 d0 cmp %rdx,%rax
>> 19d: 75 39 jne 1d8 <do_syscall_64+0xd8>
>
> Yeah, it didn't look equivalent - so I guess we want a C equivalent for:
>
> - ALTERNATIVE "shl $(64 - 48), %rcx; sar $(64 - 48), %rcx", \
> - "shl $(64 - 57), %rcx; sar $(64 - 57), %rcx", X86_FEATURE_LA57
>
> instead of the pgtable_l5_enabled() runtime test that
> __is_canonical_address() uses?
>
I don't think such a thing (without simply duplicate the above as an
alternative asm, which is obviously easy enough, and still allows the
compiler to pick the register used) would be possible without immediate
patching support[*].
Incidentally, this is a question for Uros: is there a reason this is a
mov to %ecx and not just %cl, which would save 3 bytes?
Incidentally, it is possible to save one instruction and use only *one*
alternative immediate:
leaq (%rax,%rax),%rdx
xorq %rax,%rdx
shrq $(63 - LA),%rdx # Yes, 63, not 64
# ZF=1 if canonical
This works because if bit [x] is set in the output, then bit [x] and
[x-1] in the input are different (bit [-1] considered to be zero); and
by definition a bit is canonical if and only if all the bits [63:LA] are
identical, thus bits [63:LA+1] in the output must all be zero.
The first two instructions are pure arithmetic and can thus be done in C:
bar = foo ^ (foo << 1);
... leaving only one instruction needing to be patched at runtime.
-hpa
[*] which is a whole bit of infrastructure that I know first-hand is
extremely complex[**] -- I had an almost-complete patchset done at one
time, but it has severely bitrotted. The good part is that it is
probably a lot easier to do today, with the alternatives-patching
callback routines available.
[**] the absolute best would of course be if such infrastructure could
be compiler-supported (probably as part as some really fancy support for
alternatives/static branch), but that would probably be a nightmare to
do in the compiler or a plugin.
Powered by blists - more mailing lists