[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <A6E92E4B-488D-4D01-9D87-48DA81050AC2@infradead.org>
Date: Sun, 06 Apr 2025 19:47:41 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: "Michael S. Tsirkin" <mst@...hat.com>
CC: Christoph Hellwig <hch@...radead.org>, virtio-comment@...ts.linux.dev,
Claire Chang <tientzu@...omium.org>,
linux-devicetree <devicetree@...r.kernel.org>,
Rob Herring <robh+dt@...nel.org>,
Jörg Roedel <joro@...tes.org>,
iommu@...ts.linux-foundation.org, linux-kernel@...r.kernel.org,
graf@...zon.de
Subject: Re: [RFC PATCH 1/3] content: Add VIRTIO_F_SWIOTLB to negotiate use of SWIOTLB bounce buffers
On 6 April 2025 19:28:08 BST, "Michael S. Tsirkin" <mst@...hat.com> wrote:
>On Fri, Apr 04, 2025 at 12:15:52PM +0100, David Woodhouse wrote:
>> On Fri, 2025-04-04 at 06:37 -0400, Michael S. Tsirkin wrote:
>> > On Fri, Apr 04, 2025 at 11:15:33AM +0100, David Woodhouse wrote:
>> > > On Fri, 2025-04-04 at 10:27 +0100, David Woodhouse wrote:
>> > > > On 4 April 2025 09:32:39 BST, "Michael S. Tsirkin" <mst@...hat.com>
>> > > > wrote:
>> > > > > On Fri, Apr 04, 2025 at 09:16:44AM +0100, David Woodhouse wrote:
>> > > > > > On Fri, 2025-04-04 at 04:09 -0400, Michael S. Tsirkin wrote:
>> > > > > > > On Fri, Apr 04, 2025 at 08:50:47AM +0100, David Woodhouse
>> > > > > > > wrote:
>> > > > > > > > What's annoying is that this should work out of the box
>> > > > > > > > *already* with
>> > > > > > > > virtio-mmio and a `restricted-dma-pool` — for systems which
>> > > > > > > > aren't
>> > > > > > > > afflicted by UEFI/ACPI/PCI as their discovery mechanisms.
>> > > > > > >
>> > > > > > >
>> > > > > > > That specifically would be just a driver bugfix then?
>> > > > > >
>> > > > > > I actually think it works out of the box and there isn't even a
>> > > > > > bug to
>> > > > > > fix. Haven't tested yet.
>> > > > > >
>> > > > > > The sad part is that the system does it all automatically *if* it
>> > > > > > has
>> > > > > > CONFIG_DMA_RESTRICTED_POOL (e.g. Linux) and the driver never even
>> > > > > > notices that the dma_ops it's using are the swiotlb ops using the
>> > > > > > provided buffer.
>> > > > > >
>> > > > > > Which is *kind* of nice... except that when on a guest OS which
>> > > > > > *isn't*
>> > > > > > Linux with CONFIG_DMA_RESTRICTED_POOL, the guest will just ignore
>> > > > > > the
>> > > > > > `restricted-dma-pool` node and try DMA to system memory anyway,
>> > > > > > which
>> > > > > > will fail.
>> > > > >
>> > > > > I mean, it's easy to misconfigure Linux, this is why we love it ;)
>> > > > > Why
>> > > > > is this such a concern?
>> > > >
>> > > > Because it's incompatible. In the DT world, perhaps this new *non-
>> > > > optional* feature/restriction should have come with a new
>> > > > "compatible" string such as "virtio-mmio-restricted-dma".
>> > > >
>> > > > Adding it without backwards compatibility wasn't ideal.
>> > > >
>> > > > > > That's why my proposal adds the negotiated VIRTIO_F_SWIOTLB
>> > > > > > feature, so
>> > > > > > that the device side can refuse, if the guest *isn't* agreeing to
>> > > > > > use
>> > > > > > the bounce buffer in the situations where it must do so.
>> > > > >
>> > > > >
>> > > > > OTOH then setting this feature and if you make the device force it,
>> > > > > you are breaking guests restricted-dma-pool which worked
>> > > > > previously, no?
>> > > >
>> > > > Yes. So a platform offering virtio-mmio with restricted DMA, if the
>> > > > driver doesn't accept the offered VIRTIO_F_SWIOTLB, may want to
>> > > > accept that negotiation anyway, and *hope* that the driver/OS are
>> > > > going to use the buffer anyway.
>> > > >
>> > > > I just didn't want to make that same mistake again when formalising
>> > > > and documenting this, and especially when attempting to extend it to
>> > > > PCI.
>> > >
>> > > Of course, the beauty of the restricted-dma-pool as supported by DT is
>> > > that it's a *system* memory buffer, which is actually OK as long as
>> > > it's reserved address space and not just part of normal system memory
>> > > that an unsuspecting guest might use for general purposes. So the
>> > > trusted part of the hypervisor (e.g. pKVM) can *allow* the VMM access
>> > > to that space.
>> > >
>> > > It doesn't *have* to be on-device. That just seemed like the more
>> > > natural way to do it for PCI.
>> > >
>> > > I suppose we *could* allow for the virtio-pci transport to do it the
>> > > same way as virtio-mmio though. The VIRTIO_PCI_CAP_SWIOTLB capability¹
>> > > could reference a range of system memory space, just like the
>> > > `restricted-dma-pool` property does.
>> > >
>> > > It's a weird abstraction especially for a physical PCI device to do
>> > > that because the system memory space is outside its ownership. But in a
>> > > physical device it could be writable, and you could consider it the
>> > > responsibility of the system firmware to configure it appropriately, in
>> > > accordance with the IOMMU and other DMA restrictions of the platform.
>> > >
>> > > That does solve it for the CoCo case without addressing the P2P staging
>> > > case that Christoph mentions, though.
>> > >
>> > >
>> > > ¹ I will rename it, Christoph, if it survives at all. Probably
>> > > VIRTIO_F_RESTRICTED_DMA and VIRTIO_PCI_CAP_RESTRICTED_DMA but of course
>> > > it depends on the semantics we conclude it should have.
>> >
>> > OK. So basically, all this does, is a promise by driver to only
>> > DMA into a range of memory?
>>
>> Basically, yes.
>>
>> > This part, I get. I wouldn't put it in a capability, just in config
>> > space then.
>>
>> Sure... but how? There are some things which are defined to be at fixed
>> locations in config space, like the vendor/device IDs, COMMAND, STATUS,
>> BARs, etc..
>>
>> And for the rest of the optional things which might be in config space
>> of a given device... isn't that exactly what capabilities are for?
>
>
>Sorry I am unclear. Not the pci config space. The virtio config space.
>After admin_queue_num ?
>
>
Makes sense. Let me knock up a second RFC based on that, and test using PRP0001 as a vehicle for `restricted-dma-pool` in an ACPI-afflicted guest.
Powered by blists - more mailing lists