lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 11 Apr 2019 09:06:37 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Paolo Bonzini' <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: RE: [PATCH] KVM: x86: optimize check for valid PAT value

From: Paolo Bonzini
> Sent: 10 April 2019 18:09
> On 10/04/19 16:57, Sean Christopherson wrote:
> > On Wed, Apr 10, 2019 at 12:55:53PM +0000, David Laight wrote:
> >> From: Paolo Bonzini
> >>> Sent: 10 April 2019 10:55
> >>>
> >>> This check will soon be done on every nested vmentry and vmexit,
> >>> "parallelize" it using bitwise operations.
...
> >> How about:
> >> 	/*
> >> 	 * Each byte must be 0, 1, 4, 5, 6 or 7.
> >> 	 * Convert 001x to 011x then 100x so 2 and 3 fail the test.
> >> 	 */
> >> 	data |= (data ^ 0x0404040404040404ULL)) + 0x0202020202020202ULL;
> >> 	if (data & 0xF8F8F8F8F8F8F8F8ULL)
> >> 		return false;
...
> > Really fancy:
> >    0x0000000000048447 <+247>:   movabs $0x404040404040404,%rcx
> >    0x0000000000048451 <+257>:   movabs $0x202020202020202,%rax
> >    0x000000000004845b <+267>:   xor    %rdx,%rcx
> >    0x000000000004845e <+270>:   add    %rax,%rcx
> >    0x0000000000048461 <+273>:   movabs $0xf8f8f8f8f8f8f8f8,%rax
> >    0x000000000004846b <+283>:   or     %rcx,%rdx
> >    0x000000000004846e <+286>:   test   %rax,%rdx
> >    0x0000000000048471 <+289>:   sete   %al
> >    0x0000000000048474 <+292>:   retq
> 
> Yeah, the three constants are expensive.  Too bad the really fancy
> version sums twos and xors fours; if it were the opposite, it could have
> used lea and then I would have chosen that one just for the coolness factor.

The constants are just big, if the code is inlined they are probably
otherwise free (they won't change the length of the dependency chain).
The twos can be generated from the fours without increasing the length
of the dependency chain and saving some bytes.

It may be possible to generate shorter code that executes just as
fast by generating a single constant and deriving the others from it.
- generate 4s - needed first
- shift right 2 to get 1s (in parallel with the xor)
- use lea to get 6s (in parallel with an lea to do the add)
- invert the 1s to get FEs (also in parallel with the add)
- xor the FEs with the 6s to get F8s (in parallel with the or)
- and/test for the result
I'm not going to try to persuade gcc to go that.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists