lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 6 Oct 2023 11:59:22 -0700
From:   "H. Peter Anvin" <hpa@...or.com>
To:     Ingo Molnar <mingo@...nel.org>, Brian Gerst <brgerst@...il.com>
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>,
        Andy Lutomirski <luto@...nel.org>,
        Mika Penttilä <mpenttil@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Uros Bizjak <ubizjak@...il.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH v2 0/6] x86: Clean up fast syscall return validation

On 10/5/23 13:20, Ingo Molnar wrote:
> 
> * Brian Gerst <brgerst@...il.com> wrote:
> 
>> Looking at the compiled output, the only suboptimal code appears to be
>> the canonical address test, where the C code uses the CL register for
>> the shifts instead of immediates.
>>
>>   180:   e9 00 00 00 00          jmp    185 <do_syscall_64+0x85>
>>                          181: R_X86_64_PC32      .altinstr_aux-0x4
>>   185:   b9 07 00 00 00          mov    $0x7,%ecx
>>   18a:   eb 05                   jmp    191 <do_syscall_64+0x91>
>>   18c:   b9 10 00 00 00          mov    $0x10,%ecx
>>   191:   48 89 c2                mov    %rax,%rdx
>>   194:   48 d3 e2                shl    %cl,%rdx
>>   197:   48 d3 fa                sar    %cl,%rdx
>>   19a:   48 39 d0                cmp    %rdx,%rax
>>   19d:   75 39                   jne    1d8 <do_syscall_64+0xd8>
> 
> Yeah, it didn't look equivalent - so I guess we want a C equivalent for:
> 
> -       ALTERNATIVE "shl $(64 - 48), %rcx; sar $(64 - 48), %rcx", \
> -               "shl $(64 - 57), %rcx; sar $(64 - 57), %rcx", X86_FEATURE_LA57
> 
> instead of the pgtable_l5_enabled() runtime test that
> __is_canonical_address() uses?
> 

I don't think such a thing (without simply duplicate the above as an 
alternative asm, which is obviously easy enough, and still allows the 
compiler to pick the register used) would be possible without immediate 
patching support[*].

Incidentally, this is a question for Uros: is there a reason this is a 
mov to %ecx and not just %cl, which would save 3 bytes?

Incidentally, it is possible to save one instruction and use only *one* 
alternative immediate:

	leaq (%rax,%rax),%rdx
	xorq %rax,%rdx
	shrq $(63 - LA),%rdx		# Yes, 63, not 64
	# ZF=1 if canonical

This works because if bit [x] is set in the output, then bit [x] and 
[x-1] in the input are different (bit [-1] considered to be zero); and 
by definition a bit is canonical if and only if all the bits [63:LA] are 
identical, thus bits [63:LA+1] in the output must all be zero.

The first two instructions are pure arithmetic and can thus be done in C:

	bar = foo ^ (foo << 1);

... leaving only one instruction needing to be patched at runtime.

	-hpa



[*] which is a whole bit of infrastructure that I know first-hand is 
extremely complex[**] -- I had an almost-complete patchset done at one 
time, but it has severely bitrotted. The good part is that it is 
probably a lot easier to do today, with the alternatives-patching 
callback routines available.

[**] the absolute best would of course be if such infrastructure could 
be compiler-supported (probably as part as some really fancy support for 
alternatives/static branch), but that would probably be a nightmare to 
do in the compiler or a plugin.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ