[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fc39847185a8423c8827b054a4b788c3@AcuMS.aculab.com>
Date: Mon, 15 Apr 2019 09:03:05 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Paolo Bonzini' <pbonzini@...hat.com>,
Sean Christopherson <sean.j.christopherson@...el.com>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: RE: [PATCH] KVM: x86: optimize check for valid PAT value
From: Paolo Bonzini
> Sent: 15 April 2019 09:12
> On 11/04/19 11:06, David Laight wrote:
> > It may be possible to generate shorter code that executes just as
> > fast by generating a single constant and deriving the others from it.
> > - generate 4s - needed first
> > - shift right 2 to get 1s (in parallel with the xor)
> > - use lea to get 6s (in parallel with an lea to do the add)
> > - invert the 1s to get FEs (also in parallel with the add)
> > - xor the FEs with the 6s to get F8s (in parallel with the or)
> > - and/test for the result
That version needs an extra register move I hadn't allowed for.
It is also impossible to stop gcc folding constant expressions
without an asm nop on a register.
> FWIW, here is yet another way to do it:
>
> /* Change 6/7 to 4/5 */
> data &= ~((data & 0x0404040404040404ULL) >> 1);
> /* Only allow 0/1/4/5 now */
> return !(data & 0xFAFAFAFAFAFAFAFAULL);
>
> movabs $0x404040404040404, %rcx
> andq %rdx, %rcx
> shrq %rcx
> notq %rcx
> movabs $0xFAFAFAFAFAFAFA, %rax
> andq %rcx, %rdx
> test %rax, %rdx
Fewer opcode bytes, but 5 dependant instructions
(assuming the first constant can executed in parallel
with an earlier instruction).
I think my one was only 4 dependant instructions.
All these are far faster than the loop...
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists