lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 10 Apr 2019 19:09:05 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Sean Christopherson <sean.j.christopherson@...el.com>,
        David Laight <David.Laight@...LAB.COM>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>
Subject: Re: [PATCH] KVM: x86: optimize check for valid PAT value

On 10/04/19 16:57, Sean Christopherson wrote:
> On Wed, Apr 10, 2019 at 12:55:53PM +0000, David Laight wrote:
>> From: Paolo Bonzini
>>> Sent: 10 April 2019 10:55
>>>
>>> This check will soon be done on every nested vmentry and vmexit,
>>> "parallelize" it using bitwise operations.
>>>
>>> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
>>> ---
>> ...
>>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
>>> index 28406aa1136d..7bc7ac9d2a44 100644
>>> --- a/arch/x86/kvm/x86.h
>>> +++ b/arch/x86/kvm/x86.h
>>> @@ -347,4 +347,12 @@ static inline void kvm_after_interrupt(struct kvm_vcpu *vcpu)
>>>  	__this_cpu_write(current_vcpu, NULL);
>>>  }
>>>
>>> +static inline bool kvm_pat_valid(u64 data)
>>> +{
>>> +	if (data & 0xF8F8F8F8F8F8F8F8)
>>> +		return false;
>>> +	/* 0, 1, 4, 5, 6, 7 are valid values.  */
>>> +	return (data | ((data & 0x0202020202020202) << 1)) == data;
>>> +}
>>> +
>>
>> How about:
>> 	/*
>> 	 * Each byte must be 0, 1, 4, 5, 6 or 7.
>> 	 * Convert 001x to 011x then 100x so 2 and 3 fail the test.
>> 	 */
>> 	data |= (data ^ 0x0404040404040404ULL)) + 0x0202020202020202ULL;
>> 	if (data & 0xF8F8F8F8F8F8F8F8ULL)
>> 		return false;
> 
> Woah.  My vote is for Paolo's version as the separate checks allow the
> reader to walk through step-by-step.  The generated assembly isn't much
> different from a performance perspective since the TEST+JNE will be not
> taken in the fast path.
> 
> Fancy:
>    0x000000000004844f <+255>:   movabs $0xf8f8f8f8f8f8f8f8,%rcx
>    0x0000000000048459 <+265>:   xor    %eax,%eax
>    0x000000000004845b <+267>:   test   %rcx,%rdx
>    0x000000000004845e <+270>:   jne    0x4848b <kvm_mtrr_valid+315>
>    0x0000000000048460 <+272>:   movabs $0x202020202020202,%rax
>    0x000000000004846a <+282>:   and    %rdx,%rax
>    0x000000000004846d <+285>:   add    %rax,%rax
>    0x0000000000048470 <+288>:   or     %rdx,%rax
>    0x0000000000048473 <+291>:   cmp    %rdx,%rax
>    0x0000000000048476 <+294>:   sete   %al
>    0x0000000000048479 <+297>:   retq
> 
> Really fancy:
>    0x0000000000048447 <+247>:   movabs $0x404040404040404,%rcx
>    0x0000000000048451 <+257>:   movabs $0x202020202020202,%rax
>    0x000000000004845b <+267>:   xor    %rdx,%rcx
>    0x000000000004845e <+270>:   add    %rax,%rcx
>    0x0000000000048461 <+273>:   movabs $0xf8f8f8f8f8f8f8f8,%rax
>    0x000000000004846b <+283>:   or     %rcx,%rdx
>    0x000000000004846e <+286>:   test   %rax,%rdx
>    0x0000000000048471 <+289>:   sete   %al
>    0x0000000000048474 <+292>:   retq

Yeah, the three constants are expensive.  Too bad the really fancy
version sums twos and xors fours; if it were the opposite, it could have
used lea and then I would have chosen that one just for the coolness factor.

(Quoting Avi, "mmu.c is designed around the fact that x86 has an
instruction to do "x = 12 + 9*y").

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ