lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 19 Feb 2022 08:54:46 +0100
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>
Cc:     linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT

On 2/18/22 22:45, Sean Christopherson wrote:
> On Thu, Feb 17, 2022, Paolo Bonzini wrote:
>> Whenever KVM knows the page role flags have changed, it needs to drop
>> the current MMU root and possibly load one from the prev_roots cache.
>> Currently it is papering over some overly simplistic code by just
>> dropping _all_ roots, so that the root will be reloaded by
>> kvm_mmu_reload, but this has bad performance for the TDP MMU
>> (which drops the whole of the page tables when freeing a root,
>> without the performance safety net of a hash table).
>>
>> To do this, KVM needs to do a more kvm_mmu_update_root call from
>> kvm_mmu_reset_context.  Introduce a new request bit so that the call
>> can be delayed until after a possible KVM_REQ_MMU_RELOAD, which would
>> kill all hopes of finding a cached PGD.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
>> ---
> 
> Please no.
> 
> I really, really do not want to add yet another deferred-load in the nested
> virtualization paths.

This is not a deferred load, is it?  It's only kvm_mmu_new_pgd that is 
deferred, but the PDPTR load is not.

I think I should first merge patches 1-13, then revisit the root_role 
series (which only depends on the fast_pgd_switch and caching changes), 
and then finally get back to this final part.  The reason is that 
root_role is what enables the stale-root check that you wanted; and it's 
easier to think about loading the guest PGD post-kvm_init_mmu if I can 
show you the direction I'd like to have in general, and not leave things 
half-done.

(Patch 17 is also independent and perhaps fixing a case of premature 
optimization, so I'm inclined to merge it as well).

> As Jim pointed out[1], KVM_REQ_GET_NESTED_STATE_PAGES should
> never have been merged. And on that point, I've no idea how this new request will
> interact with KVM_REQ_GET_NESTED_STATE_PAGE.  It may be a complete non-issue, but
> I'd honestly rather not have to spend the brain power.

Fair enough on the interaction, but I still think 
KVM_REQ_GET_NESTED_STATE_PAGES is a good idea.  I don't think KVM should 
access guest memory outside KVM_RUN, though there may be cases (possibly 
some PV MSRs, if I had to guess) where it does.

> And I still do not like the approach of converting kvm_mmu_reset_context() wholesale
> to not doing kvm_mmu_unload().  There are currently eight kvm_mmu_reset_context() calls:
> 
>    1.   nested_vmx_restore_host_state() - Only for a missed VM-Entry => VM-Fail
>         consistency check, not at all a performance concern.
> 
>    2.   kvm_mmu_after_set_cpuid() - Still needs to unload.  Not a perf concern.
> 
>    3.   kvm_vcpu_reset() - Relevant only to INIT.  Not a perf concern, but could be
>         converted manually to a different path without too much fuss.
> 
>    4+5. enter_smm() / kvm_smm_changed() - IMO, not a perf concern, but again could
>         be converted manually if anyone cares.
> 
>    6.   set_efer() - Silly corner case that basically requires host userspace abuse
>         of KVM APIs.  Not a perf concern.
> 
>    7+8. kvm_post_set_cr0/4() - These are the ones we really care about, and they
>         can be handled quite trivially, and can even share much of the logic with
>         kvm_set_cr3().
> 
> I strongly prefer that we take a more conservative approach and fix 7+8, and then
> tackle 1, 3, and 4+5 separately if someone cares enough about those flows to avoid
> dropping roots.

The thing is, I want to get rid of kvm_mmu_reset_context() altogether. 
I dislike the fact that it kills the roots but still keeps them in the 
hash table, thus relying on separate syncing to avoid future bugs.  It's 
very unintuitive what is "reset" and what isn't.

> Regarding KVM_REQ_MMU_RELOAD, that mess mostly goes away with my series to replace
> that with KVM_REQ_MMU_FREE_OBSOLETE_ROOTS.  Obsolete TDP MMU roots will never get
> a cache hit because the obsolete root will have an "invalid" role.  And if we care
> about optimizing this with respect to a memslot (highly unlikely), then we could
> add an MMU generation check in the cache lookup.  I was planning on posting that
> series as soon as this one is queued, but I'm more than happy to speculatively send
> a refreshed version that applies on top of this series.

Yes, please send a version on top of patches 1-13.  That can be reviewed 
and committed in parallel with the root_role changes.

Paolo

> [1] https://lore.kernel.org/all/CALMp9eT2cP7kdptoP3=acJX+5_Wg6MXNwoDh42pfb21-wdXvJg@mail.gmail.com
> [2] https://lore.kernel.org/all/20211209060552.2956723-1-seanjc@google.com

Powered by blists - more mailing lists