[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5614CFF2.902@cloudius-systems.com>
Date: Wed, 7 Oct 2015 10:55:30 +0300
From: Vlad Zolotarov <vladz@...udius-systems.com>
To: Alex Williamson <alex.williamson@...hat.com>,
Avi Kivity <avi@...lladb.com>
Cc: "Michael S. Tsirkin" <mst@...hat.com>,
Greg KH <gregkh@...uxfoundation.org>,
linux-kernel@...r.kernel.org, hjk@...sjkoch.de, corbet@....net,
bruce.richardson@...el.com, avi@...udius-systems.com,
gleb@...udius-systems.com, stephen@...workplumber.org,
alexander.duyck@...il.com
Subject: Re: [PATCH v3 2/3] uio_pci_generic: add MSI/MSI-X support
On 10/06/15 21:51, Alex Williamson wrote:
> On Tue, 2015-10-06 at 18:23 +0300, Avi Kivity wrote:
>> On 10/06/2015 05:56 PM, Michael S. Tsirkin wrote:
>>> On Tue, Oct 06, 2015 at 05:43:50PM +0300, Vlad Zolotarov wrote:
>>>> The only "like VFIO" behavior we implement here is binding the MSI-X
>>>> interrupt notification to eventfd descriptor.
>>> There will be more if you add some basic memory protections.
>>>
>>> Besides, that's not true.
>>> Your patch queries MSI capability, sets # of vectors.
>>> You even hinted you want to add BAR mapping down the road.
>> BAR mapping is already available from sysfs; it is not mandatory.
>>
>>> VFIO does all of that.
>>>
>> Copying vfio maintainer Alex (hi!).
>>
>> vfio's charter is modern iommu-capable configurations. It is designed to
>> be secure enough to be usable by an unprivileged user.
>>
>> For performance and hardware reasons, many dpdk deployments use
>> uio_pci_generic. They are willing to trade off the security provided by
>> vfio for the performance and deployment flexibility of pci_uio_generic.
>> Forcing these features into vfio will compromise its security and
>> needlessly complicate its code (I guess it can be done with a "null"
>> iommu, but then vfio will have to decide whether it is secure or not).
> It's not just the iommu model vfio uses, it's that vfio is built around
> iommu groups. For instance to use a device in vfio, the user opens the
> vfio group file and asks for the device within that group. That's a
> fairly fundamental part of the mechanics to sidestep.
>
> However, is there an opportunity at a lower level? Systems without an
> iommu typically have dma ops handled via a software iotlb (ie. bounce
> buffers), but I think they simply don't have iommu ops registered.
> Could a no-iommu, iommu subsystem provide enough dummy iommu ops to fake
> out vfio? It would need to iterate the devices on the bus and come up
> with dummy iommu groups and dummy versions of iommu_map and unmap. The
> grouping is easy, one device per group, there's no isolation anyway.
> The vfio type1 iommu backend will do pinning, which seems like an
> improvement over the mlock that uio users probably try to do now. I
> guess the no-iommu map would error if the IOVA isn't simply the bus
> address of the page mapped.
>
> Of course this is entirely unsafe and this no-iommu driver should taint
> the kernel, but it at least standardizes on one userspace API and you're
> already doing completely unsafe things with uio. vfio should be
> enlightened at least to the point that it allows only privileged users
> access to devices under such a (lack of) iommu.
Thanks for clarification, Alex.
One of the important points in the above description is that vfio has
been build around IOMMU groups - and that's a good thing!
This means that this ensures the safety for vfio users and IMHO breaking
this by introducing the no-iommu mode won't bring any good.
What do we have on a negative side of this step:
1. Just a description of the work that has to be done implies a
non-trivial code that will have to be maintained later.
2. This new mode will be absolutely unsafe while some users may
mistakenly assume that using vfio is safe in all situations like it
is now.
3. The vfio user interface is "a bit" more complicated than the one of
UIO's and if there isn't any added value (see below) users will just
prefer continue using UIO.
Let's try to analyze the possible positive sides (as u've described them
above):
1. /This added feature may allow to standardize the user-space drivers
interface. Why is it good? - This could allow us to maintain only
one infrastructure (vfio)./ That's true but unfortunately UIO
interface is already widely used and thus it can't be just killed.
Therefore instead of maintaining one unsafe user-space driver
infrastructure we'll have to maintain two. Therefore the result will
be exactly the opposite from the expected.
2. I'm not very familiar with all vfio features but regarding the "vfio
type1 iommu backend going to do pinning instead of mlock" - another
alternative to pin the pages in the memory is to use hugetlbfs,
which UIO users like DPDK do. I may be wrong but I'm not sure that
in this case using vfio type1 iommu backend would be beneficial it
terms of performance and performance is usually the most important
factor for un-safe mode users (e.g. DPDK).
So, considering the above I think that instead of complicating the
already non-trivial vfio interface even more we'd rather have two types
of user-space interfaces:
* safe - VFIO
* not safe - UIO
The thing is that this is more or less the situation right now and
according to negatives.3 above it is likely to remain this way (at least
the UIO part ;)) for some (long) time so all this "adding unsafe mode to
VFIO" initiative looks completely useless.
thanks,
vlad
>
>>>> This doesn't justifies the
>>>> hassle of implementing IOMMU-less VFIO mode.
>>> This applies to both VFIO and UIO really. I'm not sure the hassle of
>>> maintaining this functionality in tree is justified. It remains to be
>>> seen whether there are any users that won't taint the kernel.
>>> Apparently not in the current form of the patch, but who knows.
>> It is not msix that taints the kernel, it's uio_pci_generic. Msix is a
>> tiny feature addition that doesn't change the security situation one bit.
>>
>> btw, currently you can map BARs and dd to /dev/mem to your heart's
>> content without tainting the kernel. I don't see how you can claim that
>> msix support makes the situation worse, when root can access every bit
>> of physical memory, either directly or via DMA.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists