linux-kernel - Re: [PATCH 3/3] KVM: x86: do not unload MMU roots when only toggling CR0.WP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1b4d4488-9afc-91e9-790d-5b669d00217b@grsecurity.net>
Date:   Wed, 18 Jan 2023 11:17:19 +0100
From:   Mathias Krause <minipli@...ecurity.net>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        Paolo Bonzini <pbonzini@...hat.com>
Subject: Re: [PATCH 3/3] KVM: x86: do not unload MMU roots when only toggling
 CR0.WP

On 17.01.23 22:29, Sean Christopherson wrote:
> On Tue, Jan 17, 2023, Mathias Krause wrote:
>> There is no need to unload the MMU roots when only CR0.WP has changed --
>> the paging structures are still valid, only the permission bitmap needs
>> to be updated.
> 
> This doesn't hold true when KVM is using shadow paging, in which case CR0.WP
> affects the shadow page tables.  I believe that also holds true for nNPT :-(

Oh, I knew there would be a case I missed. Thank you for pointing it out!

> nEPT doesn't consume CR0.WP so we could expedite that case as well, though
> identifying that case might be annoying.

I'm fine with starting with optimizing L1 only as the performance gain
for this usual case is huge already. But sure, if more is possible, I'm
all for it. It's just that I lack the knowledge about KVM internals to
figure it out all by myself.

>> Change kvm_mmu_reset_context() to get passed the need for unloading MMU
>> roots and explicitly avoid it if only CR0.WP was toggled on a CR0 write
>> caused VMEXIT.
> 
> One thing we should explore on top of this is not intercepting CR0.WP (on Intel)
> when TDP is enabled.  It could even trigger after toggling CR0.WP N times, e.g.
> to optimize the grsecurity use case without negatively impacting workloads with
> a static CR0.WP, as walking guest memory would require an "extra" VMREAD to get
> CR0.WP in that case.

That would be even better, agreed. I'll look into it and will try to
come up with something.

> Unfortunately, AMD doesn't provide per-bit controls.
> 
>> This change brings a huge performance gain as the following micro-
>> benchmark running 'ssdd 10 50000' from rt-tests[1] on a grsecurity L1 VM
>> shows (runtime in seconds, lower is better):
>>
>>                       legacy MMU   TDP MMU
>> kvm.git/queue             11.55s    13.91s
>> kvm.git/queue+patch        7.44s     7.94s
>>
>> For legacy MMU this is ~35% faster, for TTP MMU ~43% faster.
>>
>> [1] https://git.kernel.org/pub/scm/utils/rt-tests/rt-tests.git
>>
>> Signed-off-by: Mathias Krause <minipli@...ecurity.net>
>> ---
> 
> ...
> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 508074e47bc0..d7c326ab94de 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -902,7 +902,9 @@ EXPORT_SYMBOL_GPL(load_pdptrs);
>>  
>>  void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
>>  {
>> -	if ((cr0 ^ old_cr0) & X86_CR0_PG) {
>> +	unsigned long cr0_change = cr0 ^ old_cr0;
>> +
>> +	if (cr0_change & X86_CR0_PG) {
>>  		kvm_clear_async_pf_completion_queue(vcpu);
>>  		kvm_async_pf_hash_reset(vcpu);
>>  
>> @@ -914,10 +916,18 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
>>  			kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
>>  	}
>>  
>> -	if ((cr0 ^ old_cr0) & KVM_MMU_CR0_ROLE_BITS)
>> -		kvm_mmu_reset_context(vcpu);
>> +	if (cr0_change & KVM_MMU_CR0_ROLE_BITS) {
>> +		bool unload_mmu =
>> +			cr0_change & (KVM_MMU_CR0_ROLE_BITS & ~X86_CR0_WP);
> 
> As above, this needs to guarded with a check that the MMU is direct.  And rather
> than add a flag to kvm_mmu_reset_context(), just call kvm_init_mmu() directly.
> E.g. I think this would work?
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d07563d0e204..8f9fac6d81d2 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -927,6 +927,11 @@ EXPORT_SYMBOL_GPL(load_pdptrs);
>  
>  void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned long cr0)
>  {
> +       if (vcpu->arch.mmu->root_role.direct && (cr0 ^ old_cr0) == X86_CR0_WP) {
> +               kvm_init_mmu(vcpu);
> +               return;
> +       }
> +
>         if ((cr0 ^ old_cr0) & X86_CR0_PG) {
>                 kvm_clear_async_pf_completion_queue(vcpu);
>                 kvm_async_pf_hash_reset(vcpu);

Looks much simpler and more direct. Nice. :)

I'll re-test and send a v2 later today.

Thanks,
Mathias