lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fc39847185a8423c8827b054a4b788c3@AcuMS.aculab.com>
Date:   Mon, 15 Apr 2019 09:03:05 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Paolo Bonzini' <pbonzini@...hat.com>,
        Sean Christopherson <sean.j.christopherson@...el.com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: RE: [PATCH] KVM: x86: optimize check for valid PAT value

From: Paolo Bonzini
> Sent: 15 April 2019 09:12
> On 11/04/19 11:06, David Laight wrote:
> > It may be possible to generate shorter code that executes just as
> > fast by generating a single constant and deriving the others from it.
> > - generate 4s - needed first
> > - shift right 2 to get 1s (in parallel with the xor)
> > - use lea to get 6s (in parallel with an lea to do the add)
> > - invert the 1s to get FEs (also in parallel with the add)
> > - xor the FEs with the 6s to get F8s (in parallel with the or)
> > - and/test for the result

That version needs an extra register move I hadn't allowed for.
It is also impossible to stop gcc folding constant expressions
without an asm nop on a register.

> FWIW, here is yet another way to do it:
> 
> /* Change 6/7 to 4/5 */
> data &= ~((data & 0x0404040404040404ULL) >> 1);
> /* Only allow 0/1/4/5 now */
> return !(data & 0xFAFAFAFAFAFAFAFAULL);
> 
> movabs $0x404040404040404, %rcx
> andq   %rdx, %rcx
> shrq   %rcx
> notq   %rcx
> movabs $0xFAFAFAFAFAFAFA, %rax
> andq   %rcx, %rdx
> test   %rax, %rdx

Fewer opcode bytes, but 5 dependant instructions
(assuming the first constant can executed in parallel
with an earlier instruction).
I think my one was only 4 dependant instructions.

All these are far faster than the loop...

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ