lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 17 May 2021 09:46:52 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Andy Lutomirski <luto@...nel.org>, Jon Kohler <jon@...anix.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Sean Christopherson <seanjc@...gle.com>
Cc:     Babu Moger <babu.moger@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        X86 ML <x86@...nel.org>, "H. Peter Anvin" <hpa@...or.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Yu-cheng Yu <yu-cheng.yu@...el.com>,
        Tony Luck <tony.luck@...el.com>,
        Uros Bizjak <ubizjak@...il.com>,
        Petteri Aimonen <jpa@....mail.kapsi.fi>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Mike Rapoport <rppt@...nel.org>,
        Benjamin Thiel <b.thiel@...teo.de>,
        Fan Yang <Fan_Yang@...u.edu.cn>,
        Juergen Gross <jgross@...e.com>,
        Dave Jiang <dave.jiang@...el.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Ricardo Neri <ricardo.neri-calderon@...ux.intel.com>,
        Arvind Sankar <nivedita@...m.mit.edu>,
        LKML <linux-kernel@...r.kernel.org>,
        kvm list <kvm@...r.kernel.org>
Subject: Re: [PATCH] KVM: x86: add hint to skip hidden rdpkru under
 kvm_load_host_xsave_state

On 14/05/21 07:11, Andy Lutomirski wrote:
> That's nice, but it fails to restore XINUSE[PKRU].  As far as I know,
> that bit is live, and the only way to restore it to 0 is with
> XRSTOR(S).

The manual says "It is possible for XINUSE[i] to be 1 even when state 
component i is in its initial configuration" so this is architecturally 
valid.  Does the XINUSE optimization matter for PKRU which is a single word?

>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index cebdaa1e3cf5..cd95adbd140c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -912,10 +912,10 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
>>          }
>>
>>          if (static_cpu_has(X86_FEATURE_PKU) &&
>> -           (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
>> -            (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)) &&
>> -           vcpu->arch.pkru != vcpu->arch.host_pkru)
>> -               __write_pkru(vcpu->arch.pkru);
>> +           vcpu->arch.pkru != vcpu->arch.host_pkru &&
>> +           ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
>> +            kvm_read_cr4_bits(vcpu, X86_CR4_PKE)))
>> +               __write_pkru(vcpu->arch.pkru, false);
> 
> Please tell me I'm missing something (e.g. KVM very cleverly managing
> the PKRU register using intercepts) that makes this reliably load the
> guest value.  An innocent or malicious guest could easily make that
> condition evaluate to false, thus allowing the host PKRU value to be
> live in guest mode.  (Or is something fancy going on here?)

RDPKRU/WRPKRU cannot be used unless CR4.PKE=1, but PKRU can still be 
accessed using XSAVE/XRSTOR.  However both CR4 and XCR0 have their 
writes trapped, so the guest will not use the host PKRU value before the 
next vmexit if CR4.PKE=0 and XCR0.PKRU=0.

> I don't even want to think about what happens if a perf NMI hits and
> accesses host user memory while the guest PKRU is live (on VMX -- I
> think this can't happen on SVM).

This is indeed a problem, which indeed cannot happen on SVM but is there 
on VMX.  Note that the function above is not handling all of the xstate, 
it's handling the *XSAVE state*, that is XCR0, XSS and PKRU.  Thus the 
window is small, but it's there.

Is it solvable at all, without having PKRU fields in the VMCS (and 
without masking NMIs in the LAPIC which would be too expensive)?  Dave, 
Sean, what do you think?

>>   }
>>   EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state);
>>
>> @@ -925,11 +925,11 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
>>                  return;
>>
>>          if (static_cpu_has(X86_FEATURE_PKU) &&
>> -           (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) ||
>> -            (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) {
>> +           ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
>> +            kvm_read_cr4_bits(vcpu, X86_CR4_PKE))) {
>>                  vcpu->arch.pkru = rdpkru();
>>                  if (vcpu->arch.pkru != vcpu->arch.host_pkru)
>> -                       __write_pkru(vcpu->arch.host_pkru);
>> +                       __write_pkru(vcpu->arch.host_pkru, true);
>>          }
> 
> Suppose the guest writes to PKRU and then, without exiting, sets PKE =
> 0 and XCR0[PKRU] = 0.  (Or are the intercepts such that this can't
> happen except on SEV where maybe SEV magic makes the problem go away?)

Yes, see above.  KVM needs to trap CR4 and XCR0 anyway (CR4 because you 
don't want the guest to clear e.g. MCE, XCR0 to forbid setting bits that 
the host kernel does not have in its own xstate).

> I admit I'm fairly mystified as to why KVM doesn't handle PKRU like
> the rest of guest XSTATE.
Because the rest of the guest XSTATE is live too early.  The problem you 
mention above with respect to perf, where you access host memory with 
the guest PKRU, would be very much amplified.

It is basically the opposite problem of what you have in 
switch_fpu_finish, which loads PKRU eagerly while delaying the rest to 
the switch to userland.

Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ