[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20191112165717.GA18089@linux.intel.com>
Date: Tue, 12 Nov 2019 08:57:17 -0800
From: Sean Christopherson <sean.j.christopherson@...el.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Dan Williams <dan.j.williams@...el.com>,
Radim Krčmář <rkrcmar@...hat.com>,
Vitaly Kuznetsov <vkuznets@...hat.com>,
Wanpeng Li <wanpengli@...cent.com>,
Jim Mattson <jmattson@...gle.com>,
Joerg Roedel <joro@...tes.org>, KVM list <kvm@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Adam Borowski <kilobyte@...band.pl>,
David Hildenbrand <david@...hat.com>
Subject: Re: [PATCH 1/2] KVM: MMU: Do not treat ZONE_DEVICE pages as being
reserved
On Tue, Nov 12, 2019 at 11:19:44AM +0100, Paolo Bonzini wrote:
> On 12/11/19 01:51, Dan Williams wrote:
> > An elevated page reference count for file mapped pages causes the
> > filesystem (for a dax mode file) to wait for that reference count to
> > drop to 1 before allowing the truncate to proceed. For a page cache
> > backed file mapping (non-dax) the reference count is not considered in
> > the truncate path. It does prevent the page from getting freed in the
> > page cache case, but the association to the file is lost for truncate.
>
> KVM support for file-backed guest memory is limited. It is not
> completely broken, in fact cases such as hugetlbfs are in use routinely,
> but corner cases such as truncate aren't covered well indeed.
KVM's actual MMU should be ok since it coordinates with the mmu_notifier.
kvm_vcpu_map() is where KVM could run afoul of page cache truncation.
This is the other main use of hva_to_pfn*(), where KVM directly accesses
guest memory (which could be file-backed) without coordinating with the
mmu_notifier. IIUC, an ill-timed page cache truncation could result in a
write from KVM effectively being dropped due to writeback racing with
KVM's write to the page. If that's true, then I think KVM would need to
to move to the proposed pin_user_pages() to ensure its "DMA" isn't lost.
> > As long as any memory the guest expects to be persistent is backed by
> > mmu-notifier coordination we're all good, otherwise an elevated
> > reference count does not coordinate with truncate in a reliable way.
KVM itself is (mostly) blissfully unaware of any such expectations. The
userspace VMM, e.g. Qemu, is ultimately responsible for ensuring the guest
sees a valid model, e.g. that persistent memory (as presented to the guest)
is actually persistent (from the guest's perspective).
The big caveat is the truncation issue above.
Powered by blists - more mailing lists