linux-kernel - RE: [ANNOUNCE] PUCK Agenda - 2024.08.07 - KVM userfault (guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <DS0PR11MB63734864431AD2783C229C57DC852@DS0PR11MB6373.namprd11.prod.outlook.com>
Date: Mon, 12 Aug 2024 14:12:29 +0000
From: "Wang, Wei W" <wei.w.wang@...el.com>
To: Sean Christopherson <seanjc@...gle.com>
CC: James Houghton <jthoughton@...gle.com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, Peter Xu <peterx@...hat.com>, Paolo Bonzini
	<pbonzini@...hat.com>, Oliver Upton <oliver.upton@...ux.dev>, Axel Rasmussen
	<axelrasmussen@...gle.com>, David Matlack <dmatlack@...gle.com>, "Anish
 Moorthy" <amoorthy@...gle.com>
Subject: RE: [ANNOUNCE] PUCK Agenda - 2024.08.07 - KVM userfault
 (guest_memfd/HugeTLB postcopy)

On Saturday, August 10, 2024 3:05 AM, Sean Christopherson wrote:
> On Fri, Aug 09, 2024, Wei W Wang wrote:
> > On Friday, August 9, 2024 3:05 AM, James Houghton wrote:
> > > On Thu, Aug 8, 2024 at 5:15 AM Wang, Wei W <wei.w.wang@...el.com>
> wrote:
> > There also seems to be a race condition between KVM userfault and
> userfaultfd.
> > For example, guest access to a guest-shared page triggers KVM
> > userfault to userspace while vhost (or KVM) could access to the same
> > page during the window that KVM userfault is handling the page, then
> > there will be two simultaneous faults on the same page.
> > I'm thinking how would this case be handled? (leaving it to userspace
> > to detect and handle such cases would be an complex)
> 
> Userspace is going to have to handle racing "faults" no matter what, e.g. if
> multiple vCPUs hit the same fault and exit at the same time.  I don't think it'll
> be too complex to detect spurious/fixed faults and retry.

Yes, the case of multiple vCPUs hitting the same fault shouldn't be difficult
to handle as they fall into the same handling path (i.e., KVM userfault). But if
vCPUs and vhost hit the same faults, the two types of fault exit (i.e., KVM
userfault and userfaultfd) will occur at the same time (IIUC, the vCPU access
triggers KVM userfault and the vhost access triggers userfaultfd).

So, the userspace VMM would be required to coordinate between the two types of
userfault. For example, when the page data is fetched from the source, VMM first
needs to determine whether the page should be installed via UFFDIO_COPY (for the
userfaultfd case) and/or the new uAPI, say KVM_USERFAULT_COPY (for the KVM
userfault case).

In the example above, both UFFDIO_COPY and KVM_USERFAULT_COPY need to be
invoked, e.g.:
#1 invoke KVM_USERFAULT_COPY
#2 invoke UFFDIO_COPY

This requires that UFFDIO_COPY does not conflict with KVM_USERFAULT_COPY. Current
UFFDIO_COPY will fail (thus not waking up the threads on the waitq) when it fails to
install the PTE into the page table (in the above example the PTE has already been
installed into the page table by KVM_USERFAULT_COPY at #1).