lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZW6eVexQNIqtwDaZ@yzhao56-desk.sh.intel.com>
Date:   Tue, 5 Dec 2023 11:51:49 +0800
From:   Yan Zhao <yan.y.zhao@...el.com>
To:     Sean Christopherson <seanjc@...gle.com>
CC:     <iommu@...ts.linux.dev>, <kvm@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <alex.williamson@...hat.com>,
        <jgg@...dia.com>, <pbonzini@...hat.com>, <joro@...tes.org>,
        <will@...nel.org>, <robin.murphy@....com>, <kevin.tian@...el.com>,
        <baolu.lu@...ux.intel.com>, <dwmw2@...radead.org>,
        <yi.l.liu@...el.com>
Subject: Re: [RFC PATCH 00/42] Sharing KVM TDP to IOMMU

On Mon, Dec 04, 2023 at 09:00:55AM -0800, Sean Christopherson wrote:
> On Sat, Dec 02, 2023, Yan Zhao wrote:
> Please list out the pros and cons for each.  In the cons column for piggybacking
> KVM's page tables:
> 
>  - *Significantly* increases the complexity in KVM
The complexity to KVM (up to now) are
a. fault in non-vCPU context
b. keep exported root always "active"
c. disallow non-coherent DMAs
d. movement of SPTE_MMU_PRESENT

for a, I think it's accepted, and we can see eager page split allocates
       non-leaf pages in non-vCPU context already.
for b, it requires exported TDP root to keep "active" in KVM's "fast zap" (which
       invalidates all active TDP roots). And instead, the exported TDP's leaf
       entries are all zapped.
       Though it looks not "fast" enough, it avoids an unnecessary root page
       zap, and it's actually not frequent --
       - one for memslot removal (IO page fault is unlikey to happen during VM
                                  boot-up)
       - one for MMIO gen wraparound (which is rare)
       - one for nx huge page mode change (which is rare too)
for c, maybe we can work out a way to remove the MTRR stuffs.
for d, I added a config to turn on/off this movement. But right, KVM side will
       have to sacrifice a bit for software usage and take care of it when the
       config is on.

>  - Puts constraints on what KVM can/can't do in the future (see the movement
>    of SPTE_MMU_PRESENT).
>  - Subjects IOMMUFD to all of KVM's historical baggage, e.g. the memslot deletion
>    mess, the truly nasty MTRR emulation (which I still hope to delete), the NX
>    hugepage mitigation, etc.
NX hugepage mitigation only exists on certain CPUs. I don't see it in recent
Intel platforms, e.g. SPR and GNR...
We can disallow sharing approach if NX huge page mitigation is enabled.
But if pinning or partial pinning are not involved, nx huge page will only cause
unnecessary zap to reduce performance, but functionally it still works well.

Besides, for the extra IO invalidation involved in TDP zap, I think SVM has the
same issue. i.e. each zap in primary MMU is also accompanied by a IO invalidation.

> 
> Please also explain the intended/expected/targeted use cases.  E.g. if the main
> use case is for device passthrough to slice-of-hardware VMs that aren't memory
> oversubscribed, 
>
The main use case is for device passthrough with all devices supporting full
IOPF.
Opportunistically, we hope it can be used in trusted IO, where TDP are shared
to IO side. So, there's only one page table audit required and out-of-sync
window for mappings between CPU and IO side can also be eliminated.

> > - Unified page table management
> >   The complexity of allocating guest pages per GPAs, registering to MMU
> >   notifier on host primary MMU, sub-page unmapping, atomic page merge/split
> 
> Please find different terminology than "sub-page".  With Sub-Page Protection, Intel
> has more or less established "sub-page" to mean "less than 4KiB granularity".  But
> that can't possibly what you mean here because KVM doesn't support (un)mapping
> memory at <4KiB granularity.  Based on context above, I assume you mean "unmapping
> arbitrary pages within a given range".
>
Ok, sorry for this confusion.
By "sub-page unmapping", I mean atomic huge page splitting and unmapping smaller
range in the previous huge page.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ