linux-kernel - RE: [RFC] /dev/ioasid uAPI proposal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB5433B2E25895D240BA3B182B8C859@BN9PR11MB5433.namprd11.prod.outlook.com>
Date:   Wed, 27 Oct 2021 06:18:55 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     Paolo Bonzini <pbonzini@...hat.com>,
        Jason Gunthorpe <jgg@...dia.com>,
        Alex Williamson <alex.williamson@...hat.com>
CC:     "Enrico Weigelt, metux IT consult" <lkml@...ux.net>,
        Jean-Philippe Brucker <jean-philippe@...aro.org>,
        "Jiang, Dave" <dave.jiang@...el.com>,
        "Raj, Ashok" <ashok.raj@...el.com>,
        "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
        Jonathan Corbet <corbet@....net>,
        Robin Murphy <robin.murphy@....com>,
        LKML <linux-kernel@...r.kernel.org>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        David Gibson <david@...son.dropbear.id.au>,
        Kirti Wankhede <kwankhede@...dia.com>,
        "David Woodhouse" <dwmw2@...radead.org>,
        Jason Wang <jasowang@...hat.com>
Subject: RE: [RFC] /dev/ioasid uAPI proposal

Hi, Paolo,

> From: Paolo Bonzini <pbonzini@...hat.com>
> Sent: Wednesday, June 9, 2021 11:21 PM
> 
> On 09/06/21 16:45, Jason Gunthorpe wrote:
> > On Wed, Jun 09, 2021 at 08:31:34AM -0600, Alex Williamson wrote:
> >
> >> If we go back to the wbinvd ioctl mechanism, if I call that ioctl with
> >> an ioasidfd that contains no devices, then I shouldn't be able to
> >> generate a wbinvd on the processor, right?  If I add a device,
> >> especially in a configuration that can generate non-coherent DMA, now
> >> that ioctl should work.  If I then remove all devices from that ioasid,
> >> what then is the difference from the initial state.  Should the ioctl
> >> now work because it worked once in the past?
> >
> > The ioctl is fine, but telling KVM to enable WBINVD is very similar to
> > open and then reconfiguring the ioasid_fd is very similar to
> > chmod. From a security perspective revoke is not strictly required,
> > IMHO.
> 
> I absolutely do *not* want an API that tells KVM to enable WBINVD.  This
> is not up for discussion.
> 
> But really, let's stop calling the file descriptor a security proof or a
> capability.  It's overkill; all that we are doing here is kernel
> acceleration of the WBINVD ioctl.
> 
> As a thought experiment, let's consider what would happen if wbinvd
> caused an unconditional exit from guest to userspace.  Userspace would
> react by invoking the ioctl on the ioasid.  The proposed functionality
> is just an acceleration of this same thing, avoiding the
> guest->KVM->userspace->IOASID->wbinvd trip.

While the concept here makes sense, in reality implementing a wbinvd
ioctl for userspace requiring iommufd (previous /dev/ioasid is renamed
to /dev/iommu now) to track dirty CPUs that a given process has been 
running since wbinvd only flushes local cache. KVM tracks dirty CPUs by 
registering preempt notifier on the current vCPU. iommufd may do the 
same thing for the thread which opens /dev/iommu, but per below
discussion one open is whether it's worthwhile adding such hassle for
something which no real user is interested in today except kvm:

https://lore.kernel.org/lkml/20211025233459.GM2744544@nvidia.com/

Is it ok to omit the actual wbinvd ioctl here and just leverage vfio-kvm
contract to manage whether guest wbinvd is emulated as no-op? It is 
still iommufd which decides whether wbinvd is allowed (based on IOAS 
and device attach information) but just sort of special casing that the 
operation can only be done via kvm path...

btw does kvm community set a strict criteria that any operation that
the guest can do must be first carried in host uAPI first? In concept
KVM deals with ISA-level to cover both guest kernel and guest user
while host uAPI is only for host user. Introducing new uAPIs to allow
host user doing whatever guest kernel can do sounds ideal, but not
exactly necessary imho.

> 
> This is why the API that I want, and that is already exists for VFIO
> group file descriptors, informs KVM of which "ioctls" the guest should
> be able to do via privileged instructions[1].  Then the kernel works out
> with KVM how to ensure a 1:1 correspondence between the operation of the
> ioctls and the privileged operations.
> 
> One way to do it would be to always trap WBINVD and invoke the same
> kernel function that implements the ioctl.  The function would do either
> a wbinvd or nothing, based on whether the ioasid has any device.  The
> next logical step is a notification mechanism that enables WBINVD (by
> disabling the WBINVD intercept) when there are devices in the ioasidfd,
> and disables WBINVD (by enabling a no-op intercept) when there are none.
> 
> And in fact once all VFIO devices are gone, wbinvd is for all purposes a
> no-op as far as the guest kernel can tell.  So there's no reason to
> treat it as anything but a no-op.
> 
> Thanks,
> 
> Paolo
> 
> [1] As an aside, I must admit I didn't entirely understand the design of
> the KVM-VFIO device back when Alex added it.  But with this model it was
> absolutely the right thing to do, and it remains the right thing to do
> even if VFIO groups are replaced with IOASID file descriptors.