linux-kernel - Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into check for 2MiB GTT entry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y7SaklDQD0EoIs8l@google.com>
Date:   Tue, 3 Jan 2023 21:13:54 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Yan Zhao <yan.y.zhao@...el.com>
Cc:     Paolo Bonzini <pbonzini@...hat.com>,
        Zhenyu Wang <zhenyuw@...ux.intel.com>,
        Zhi Wang <zhi.a.wang@...el.com>, kvm@...r.kernel.org,
        intel-gvt-dev@...ts.freedesktop.org,
        intel-gfx@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
        Ben Gardon <bgardon@...gle.com>
Subject: Re: [PATCH 03/27] drm/i915/gvt: Incorporate KVM memslot info into
 check for 2MiB GTT entry

On Wed, Dec 28, 2022, Yan Zhao wrote:
> On Fri, Dec 23, 2022 at 12:57:15AM +0000, Sean Christopherson wrote:
> > Honor KVM's max allowed page size when determining whether or not a 2MiB
> > GTT shadow page can be created for the guest.  Querying KVM's max allowed
> > size is somewhat odd as there's no strict requirement that KVM's memslots
> > and VFIO's mappings are configured with the same gfn=>hva mapping, but
> Without vIOMMU, VFIO's mapping is configured with the same as KVM's
> memslots, i.e. with the same gfn==>HVA mapping

But that's controlled by userspace, correct?

> > the check will be accurate if userspace wants to have a functional guest,
> > and at the very least checking KVM's memslots guarantees that the entire
> > 2MiB range has been exposed to the guest.
> 
> I think just check the entrie 2MiB GFN range are all within KVM memslot is
> enough.

Strictly speaking, no.  E.g. if a 2MiB region is covered with multiple memslots
and the memslots have different properties.

> If for some reason, KVM maps a 2MiB range in 4K sizes, KVMGT can still map
> it in IOMMU size in 2MiB size as long as the PFNs are continous and the
> whole range is all exposed to guest.

I agree that practically speaking this will hold true, but if KVMGT wants to honor
KVM's memslots then checking that KVM allows a hugepage is correct.  Hrm, but on
the flip side, KVMGT ignores read-only memslot flags, so KVMGT is already ignoring
pieces of KVM's memslots.

I have no objection to KVMGT defining its ABI such that KVMGT is allowed to create
2MiB so long as (a) the GFN is contiguous according to VFIO, and (b) that the entire
2MiB range is exposed to the guest.

That said, being fully permissive also seems wasteful, e.g. KVM would need to
explicitly support straddling multiple memslots.

As a middle ground, what about tweaking kvm_page_track_is_valid_gfn() to take a
range, and then checking that the range is contained in a single memslot?

E.g. something like:

bool kvm_page_track_is_contiguous_gfn_range(struct kvm *kvm, gfn_t gfn,
					    unsigned long nr_pages)
{
	struct kvm_memory_slot *memslot;
	bool ret;
	int idx;

	idx = srcu_read_lock(&kvm->srcu);
	memslot = gfn_to_memslot(kvm, gfn);
	ret = kvm_is_visible_memslot(memslot) &&
	      gfn + nr_pages <= memslot->base_gfn + memslot->npages;
	srcu_read_unlock(&kvm->srcu, idx);

	return ret;
}

> Actually normal device passthrough with VFIO-PCI also maps GFNs in a
> similar way, i.e. maps a guest visible range in as large size as
> possible as long as the PFN is continous. 
> > 
> > Note, KVM may also restrict the mapping size for reasons that aren't
> > relevant to KVMGT, e.g. for KVM's iTLB multi-hit workaround or if the gfn
> Will iTLB multi-hit affect DMA?

I highly doubt it, I can't imagine an IOMMU would have a dedicated instruction
TLB :-)

> AFAIK, IOMMU mappings currently never sets exec bit (and I'm told this bit is
> under discussion to be removed).