linux-kernel - Re: [PATCH v2 3/3] KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <addaedc2-2050-06a1-e241-047c6e4c94c3@redhat.com>
Date:   Tue, 30 Mar 2021 19:18:32 +0200
From:   Paolo Bonzini <pbonzini@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>,
        Ben Gardon <bgardon@...gle.com>
Cc:     Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>, kvm <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 3/3] KVM: x86/mmu: Don't allow TDP MMU to yield when
 recovering NX pages

On 25/03/21 23:25, Sean Christopherson wrote:
> On Thu, Mar 25, 2021, Ben Gardon wrote:
>> On Thu, Mar 25, 2021 at 1:01 PM Sean Christopherson <seanjc@...gle.com> wrote:
>>> +static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start,
>>> +                                            gfn_t end)
>>> +{
>>> +       return __kvm_tdp_mmu_zap_gfn_range(kvm, start, end, true);
>>> +}
>>> +static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
>>
>> I'm a little leary of adding an interface which takes a non-root
>> struct kvm_mmu_page as an argument to the TDP MMU.
>> In the TDP MMU, the struct kvm_mmu_pages are protected rather subtly.
>> I agree this is safe because we hold the MMU lock in write mode here,
>> but if we ever wanted to convert to holding it in read mode things
>> could get complicated fast.
>> Maybe this is more of a concern if the function started to be used
>> elsewhere since NX recovery is already so dependent on the write lock.
> 
> Agreed.  Even writing the comment below felt a bit awkward when thinking about
> additional users holding mmu_lock for read.  Actually, I should remove that
> specific blurb since zapping currently requires holding mmu_lock for write.
> 
>> Ideally though, NX reclaim could use MMU read lock +
>> tdp_mmu_pages_lock to protect the list and do reclaim in parallel with
>> everything else.
> 
> Yar, processing all legacy MMU pages, and then all TDP MMU pages to avoid some
> of these dependencies crossed my mind.  But, it's hard to justify effectively
> walking the list twice.  And maintaining two lists might lead to balancing
> issues, e.g. the legacy MMU and thus nested VMs get zapped more often than the
> TDP MMU, or vice versa.
> 
>> The nice thing about drawing the TDP MMU interface in terms of GFNs
>> and address space IDs instead of SPs is that it doesn't put
>> constraints on the implementation of the TDP MMU because those GFNs
>> are always going to be valid / don't require any shared memory.
>> This is kind of innocuous because it's immediately converted into that
>> gfn interface, so I don't know how much it really matters.
>>
>> In any case this change looks correct and I don't want to hold up
>> progress with bikeshedding.
>> WDYT?
> 
> I think we're kind of hosed either way.  Either we add a helper in the TDP MMU
> that takes a SP, or we bleed a lot of information about the details of TDP MMU
> into the common MMU.  E.g. the function could be open-coded verbatim, but the
> whole comment below, and the motivation for not feeding in flush is very
> dependent on the internal details of TDP MMU.
> 
> I don't have a super strong preference.  One thought would be to assert that
> mmu_lock is held for write, and then it largely come future person's problem :-)

Queued all three, with lockdep_assert_held_write here.

Paolo