[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151011182809.GA8154@redhat.com>
Date: Sun, 11 Oct 2015 21:28:09 +0300
From: "Michael S. Tsirkin" <mst@...hat.com>
To: Alex Williamson <alex.williamson@...hat.com>
Cc: avi@...lladb.com, avi@...udius-systems.com, gleb@...lladb.com,
corbet@....net, bruce.richardson@...el.com,
linux-kernel@...r.kernel.org, alexander.duyck@...il.com,
gleb@...udius-systems.com, stephen@...workplumber.org,
vladz@...udius-systems.com, iommu@...ts.linux-foundation.org,
hjk@...sjkoch.de, gregkh@...uxfoundation.org
Subject: Re: [RFC PATCH 0/2] VFIO no-iommu
On Fri, Oct 09, 2015 at 12:40:56PM -0600, Alex Williamson wrote:
> Recent patches for UIO have been attempting to add MSI/X support,
> which unfortunately implies DMA support, which users have been
> enabling anyway, but was never intended for UIO. VFIO on the other
> hand expects an IOMMU to provide isolation of devices, but provides
> a much more complete device interface, which already supports full
> MSI/X support. There's really no way to support userspace drivers
> with DMA capable devices without an IOMMU to protect the host, but
> we can at least think about doing it in a way that properly taints
> the kernel and avoids creating new code duplicating existing code,
> that does have a supportable use case.
>
> The diffstat is only so large because I moved vfio.c to vfio_core.c
> so I could more easily keep the module named vfio.ko while keeping
> the bulk of the no-iommu support in a separate file that can be
> optionally compiled. We're really looking at a couple hundred lines
> of mostly stub code. The VFIO_NOIOMMU_IOMMU could certainly be
> expanded to do page pinning and virt_to_bus() translation, but I
> didn't want to complicate anything yet.
I think it's already useful like this, since all current users
seem happy enough to just use hugetlbfs to do pinning, and
ignore translation.
> I've only compiled this and tested loading the module with the new
> no-iommu mode enabled, I haven't actually tried to port a DPDK
> driver to it, though it ought to be a pretty obvious mix of the
> existing UIO and VFIO versions (set the IOMMU, but avoid using it
> for mapping, use however bus translations are done w/ UIO). The core
> vfio device file is still /dev/vfio/vfio, but all the groups become
> /dev/vfio-noiommu/$GROUP.
>
> It should be obvious, but I always feel obligated to state that this
> does not and will not ever enable device assignment to virtual
> machines on non-IOMMU capable platforms.
In theory, it's kind of possible using paravirtualization.
Within guest, you'd make map_page retrieve the io address from the host
and return that as dma_addr_t. The only question would be APIs that
require more than one contigious page in IO space (e.g. I think alloc
coherent is like this?).
Not a problem if host is using hugetlbfs, but if not, I guess we could
add a hypercall and some Linux API on the host to trigger compaction
on the host aggressively. MADV_CONTIGIOUS?
> I'm curious what IOMMU folks think of this. This hack is really
> only possible because we don't use iommu_ops for regular DMA, so we
> can hijack it fairly safely. I believe that's intended to change
> though, so this may not be practical long term. Thanks,
>
> Alex
>
> ---
>
> Alex Williamson (2):
> vfio: Move vfio.c vfio_core.c
> vfio: Include no-iommu mode
>
>
> drivers/vfio/Kconfig | 15
> drivers/vfio/Makefile | 4
> drivers/vfio/vfio.c | 1640 ------------------------------------------
> drivers/vfio/vfio_core.c | 1680 +++++++++++++++++++++++++++++++++++++++++++
> drivers/vfio/vfio_noiommu.c | 185 +++++
> drivers/vfio/vfio_private.h | 31 +
> include/uapi/linux/vfio.h | 2
> 7 files changed, 1917 insertions(+), 1640 deletions(-)
> delete mode 100644 drivers/vfio/vfio.c
> create mode 100644 drivers/vfio/vfio_core.c
> create mode 100644 drivers/vfio/vfio_noiommu.c
> create mode 100644 drivers/vfio/vfio_private.h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists