[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211115211952.GA2534655@nvidia.com>
Date: Mon, 15 Nov 2021 17:19:52 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Robin Murphy <robin.murphy@....com>
Cc: Christoph Hellwig <hch@...radead.org>,
Kevin Tian <kevin.tian@...el.com>,
Chaitanya Kulkarni <kch@...dia.com>,
Ashok Raj <ashok.raj@...el.com>, kvm@...r.kernel.org,
rafael@...nel.org, Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Cornelia Huck <cohuck@...hat.com>,
Will Deacon <will@...nel.org>, linux-kernel@...r.kernel.org,
iommu@...ts.linux-foundation.org,
Alex Williamson <alex.williamson@...hat.com>,
Jacob jun Pan <jacob.jun.pan@...el.com>,
linux-pci@...r.kernel.org, Bjorn Helgaas <bhelgaas@...gle.com>,
Diana Craciun <diana.craciun@....nxp.com>
Subject: Re: [PATCH 03/11] PCI: pci_stub: Suppress kernel DMA ownership
auto-claiming
On Mon, Nov 15, 2021 at 08:58:19PM +0000, Robin Murphy wrote:
> > The above scenarios are already blocked by the kernel with
> > LOCKDOWN_DEV_MEM - yes there are historical ways to violate kernel
> > integrity, and these days they almost all have mitigation. I would
> > consider any kernel integrity violation to be a bug today if
> > LOCKDOWN_INTEGRITY_MAX is enabled.
> >
> > I don't know why you bring this up?
>
> Because as soon as anyone brings up "security" I'm not going to blindly
> accept the implicit assumption that VFIO is the only possible way to get one
> device to mess with another. That was just a silly example in the most basic
> terms, and obviously I don't expect well-worn generic sysfs interfaces to be
> a genuine threat, but how confident are you that no other subsystem- or
> driver-level interfaces past present and future can ever be tricked into p2p
> DMA?
Given the definition of LOCKDOWN_INTEGRITY_MAX I will consider any
past/present/future p2p attacks as definitive kernel bugs.
Generally, allowing a device to do arbitary DMA to a userspace
controlled address is a pretty serious bug, and directly attacking the
kernel memory is a much more interesting and serious threat vector.
> > What is the preference then? This is the only working API today,
> > right?
>
> I believe the intent was that everyone should move to
> iommu_group_get()/iommu_attach_group() - precisely *because*
> iommu_attach_device() can't work sensibly for multi-device groups.
And iommu_attach_group() can't work sensibly for anything except VFIO
today, so hum :)
> > > Indeed I wasn't imagining it changing any ownership, just preventing a group
> > > from being attached to a non-default domain if it contains devices bound to
> > > different incompatible drivers.
> >
> > So this could solve just the domain/DMA API problem, but it leaves the
> > MMIO peer-to-peer issue unsolved, and it gives no tools to solve it in
> > a layered way.
> >
> > This seems like half an idea, do you have a solution for the rest?
>
> Tell me how the p2p DMA issue can manifest if device A is prohibited from
> attaching to VFIO's unmanaged domain while device B still has a driver
> bound, and thus would fail to be assigned to userspace in the first place.
> And conversely if non-VFIO drivers are still prevented from binding to
> device B while device A remains attached to the VFIO domain.
You've assumed that a group is continuously attached to the same
domain during the entire period that userspace has MMIO.
Any domain detatch creates a race where a kernel driver can jump in
and bind, while user space continues to have MMIO control over a
device. That violates the security invariant.
Many new flows, like PASID support, are requiring dynamically changing
the domain bound to a group.
If you want to go in this direction then we also need to have some
kind of compare and swap operation for the domain currently bound to a
group.
>From a security perspective I disliek this idea a lot. Instead of
having nice clear barriers indicating the security domain we have a
very subtle 'so long as a domain is attached' idea, which is going to
get broken.
> > The concept of DMA USER is important here, and it is more than just
> > which domain is attached.
>
> Tell me how a device would be assigned to userspace while its group is still
> attached to a kernel-managed default domain.
>
> As soon as anyone calls iommu_attach_group() - or indeed
> iommu_attach_device() if more devices may end up hotplugged into the same
> group later - *that's* when the door opens for potential subversion of any
> kind, without ever having to leave kernel space.
The approach in this series can solve both, attach_device could switch
the device to user mode and it will block future hot plugged kernel
drivers.
> > VFIO also has logic related to the file
>
> Yes, because unsurprisingly VFIO code is tailored for the specific case of
> VFIO usage rather than anything more general.
VFIO represents this class of users exposing the IOMMU to userspace,
I say it is general of that use class.
> > It isn't muddying the water, it is providing common security code that
> > is easy to undertand.
> >
> > *Which* userspace FD/process owns the iommu_group is important
> > security information because we can't have process A do DMA attacks on
> > some other process B.
>
> Tell me how a single group could be attached to two domains representing two
> different process address spaces at once.
Again you are focused on domains and ignoring MMIO.
Requiring the users of the API to continuously assert a non-default
domain is a non-trivial ask.
> In case this concept wasn't as clear as I thought, which seems to be so:
>
> | dev->iommu_group->domain | dev->driver
> DMA_OWNER_NONE | default | unbound
> DMA_OWNER_KERNEL | default | bound
> DMA_OWNER_USER | non-default | bound
Unfortunately this can't use dev->driver. Reading dev->driver of every
device in the group requires holding the device_lock. really_probe()
already holds the device lock so this becomes a user triggerable ABBA
deadlock when scaled up to N devices.
This is why this series uses only the group mutex and tracks if
drivers are bound inside the group. I view that as unavoidable.
Jason
Powered by blists - more mailing lists