linux-kernel - Re: [PATCH v2] KVM: Move gfn_to_pfn_cache invalidation to invalidate_range

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cf1ffd2a852a7a838894bd85f52649e98e88df51.camel@infradead.org>
Date: Wed, 21 Aug 2024 07:57:11 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Mushahid Hussain
 <hmushi@...zon.co.uk>,  Vitaly Kuznetsov <vkuznets@...hat.com>, Wanpeng Li
 <wanpengli@...cent.com>, Jim Mattson <jmattson@...gle.com>,  Joerg Roedel
 <joro@...tes.org>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
 Mingwei Zhang <mizhang@...gle.com>, Maxim Levitsky <mlevitsk@...hat.com>
Subject: Re: [PATCH v2] KVM: Move gfn_to_pfn_cache invalidation to
 invalidate_range_end hook

On Tue, 2024-08-20 at 14:44 -0700, Sean Christopherson wrote:
> On Tue, Aug 20, 2024, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@...zon.co.uk>
> > 
> > The existing retry loop in hva_to_pfn_retry() is extremely pessimistic.
> > If there are any concurrent invalidations running, it's effectively just
> > a complex busy wait loop because its local mmu_notifier_retry_cache()
> > function will always return true.
> > 
> > Since multiple invalidations can be running in parallel, this can result
> > in a situation where hva_to_pfn_retry() just backs off and keep retrying
> > for ever, not making any progress.
> > 
> > Solve this by being a bit more selective about when to retry.
> > 
> > Introduce a separate 'needs invalidation' flag to the GPC, which allows
> > it to be marked invalid even while hva_to_pfn_retry() has dropped the
> > lock to map the newly-found PFN. This allows the invalidation to moved
> > to the range_end hook, and the retry loop only occurs for a given GPC if
> > its particular uHVA is affected.
> > 
> > However, the contract for invalidate_range_{start,end} is not like a
> > simple TLB; the pages may have been freed by the time the end hook is
> > called. A "device" may not create new mappings after the _start_ hook is
> > called. To meet this requirement, hva_to_pfn_retry() now waits until no
> > invalidations are currently running which may affect its uHVA, before
> > finally setting the ->valid flag and returning.
> 
> Please split this into 3 patches:
> 
>  1. Add range-based GPC retry
>  2. Add the wait mechanism.
>  3. Add the needs_invalidation logic.
> 
> #1 and #2 make sense to me, but I'm struggling to understanding why #3 is needed.
> KVM absolutely must not touch the memory after .invalidate_range_start(), so I
> don't see what is gained by deferring invalidation to invalidate_range_end().

KVM absolutely must not touch the memory after .invalidate_range_start().

Let us define "touch the memory" as kmapping it, setting gpc->khva,
setting the gpc->valid flag and dropping the write lock on gpc->lock.

Until those things happen, no user of the GPC is going to dereference
gpc->khva and *actually* touch the memory.

So, how do we ensure that hva_to_pfn_retry() never sets the ->valid
flag on a GPC which might be in the process of being invalidated (i.e.
.invalidate_range_start() has been called already, but not yet
.invalidate_range_end())?

If hva_to_pfn_retry() is called between the start/end invalidation
callbacks, the range-based tracking lets it know that there is a
pending invalidation which *might* affect the GPC it's working on, as
gpc_invalidations_pending() returns true. So what happens then? ...

If there is a pending invalidation which *might* affect its GPC, that's
why hva_to_pfn_retry() now waits until any pending invalidations are
complete, then checks the ->needs_invalidation flag to see if that
pending invalidation actually *did* affect its GPC.

This is where #3 comes in. Without it, we'd have to be pessimistic and
assume that *any* time gpc_invalidations_pending() returned true, we
have to ditch the lookup and start again. We would potentially have a
*lot* more false positives in the hva_to_pfn_retry() revalidation loop.

I'm also not entirely sure before the first coffee of the morning has
taken effect, how the locking would work. Wouldn't we want to hold both
kvm->mn_invalidate_lock *and* the gpc->lock at the same time, to check
the range and set gpc->valid 'atomically'? I suppose that's possible
but I'd prefer to avoid it.

Without that atomicity, an .invalidate_range_start() call could occur
while hva_to_pfn_retry() has completed its loop and is setting the
gpc->valid flag.

Download attachment "smime.p7s" of type "application/pkcs7-signature" (5965 bytes)