linux-kernel - Re: [RFC PATCH 0/6] KVM: x86: async PF user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z7d5HT7FpE-ZsHQ9@google.com>
Date: Thu, 20 Feb 2025 10:49:01 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Nikita Kalyazin <kalyazin@...zon.com>
Cc: pbonzini@...hat.com, corbet@....net, tglx@...utronix.de, mingo@...hat.com, 
	bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com, rostedt@...dmis.org, 
	mhiramat@...nel.org, mathieu.desnoyers@...icios.com, kvm@...r.kernel.org, 
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, 
	linux-trace-kernel@...r.kernel.org, jthoughton@...gle.com, david@...hat.com, 
	peterx@...hat.com, oleg@...hat.com, vkuznets@...hat.com, gshan@...hat.com, 
	graf@...zon.de, jgowans@...zon.com, roypat@...zon.co.uk, derekmn@...zon.com, 
	nsaenz@...zon.es, xmarcalx@...zon.com
Subject: Re: [RFC PATCH 0/6] KVM: x86: async PF user

On Thu, Feb 20, 2025, Nikita Kalyazin wrote:
> On 19/02/2025 15:17, Sean Christopherson wrote:
> > On Wed, Feb 12, 2025, Nikita Kalyazin wrote:
> > The conundrum with userspace async #PF is that if userspace is given only a single
> > bit per gfn to force an exit, then KVM won't be able to differentiate between
> > "faults" that will be handled synchronously by the vCPU task, and faults that
> > usersepace will hand off to an I/O task.  If the fault is handled synchronously,
> > KVM will needlessly inject a not-present #PF and a present IRQ.
> 
> Right, but from the guest's point of view, async PF means "it will probably
> take a while for the host to get the page, so I may consider doing something
> else in the meantime (ie schedule another process if available)".

Except in this case, the guest never gets a chance to run, i.e. it can't do
something else.  From the guest point of view, if KVM doesn't inject what is
effectively a spurious async #PF, the VM-Exiting instruction simply took a (really)
long time to execute.

> If we are exiting to userspace, it isn't going to be quick anyway, so we can
> consider all such faults "long" and warranting the execution of the async PF
> protocol.  So always injecting a not-present #PF and page ready IRQ doesn't
> look too wrong in that case.

There is no "wrong", it's simply wasteful.  The fact that the userspace exit is
"long" is completely irrelevant.  Decompressing zswap is also slow, but it is
done on the current CPU, i.e. is not background I/O, and so doesn't trigger async
#PFs.

In the guest, if host userspace resolves the fault before redoing KVM_RUN, the
vCPU will get two events back-to-back: an async #PF, and an IRQ signalling completion
of that #PF.

> > > What advantage can you see in it over exiting to userspace (which already exists
> > > in James's series)?
> > 
> > It doesn't exit to userspace :-)
> > 
> > If userspace simply wakes a different task in response to the exit, then KVM
> > should be able to wake said task, e.g. by signalling an eventfd, and resume the
> > guest much faster than if the vCPU task needs to roundtrip to userspace.  Whether
> > or not such an optimization is worth the complexity is an entirely different
> > question though.
> 
> This reminds me of the discussion about VMA-less UFFD that was coming up
> several times, such as [1], but AFAIK hasn't materialised into something
> actionable.  I may be wrong, but James was looking into that and couldn't
> figure out a way to scale it sufficiently for his use case and had to stick
> with the VM-exit-based approach.  Can you see a world where VM-exit
> userfaults coexist with no-VM-exit way of handling async PFs?

The issue with UFFD is that it's difficult to provide a generic "point of contact",
whereas with KVM userfault, signalling can be tied to the vCPU, and KVM can provide
per-vCPU buffers/structures to aid communication.

That said, supporting "exitless" KVM userfault would most definitely be premature
optimization without strong evidence it would benefit a real world use case.