linux-kernel - Re: [RFC PATCH 1/3] content: Add VIRTIO_F_SWIOTLB to negotiate use of SWIOTLB bounce buffers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f127d8bf-4767-4203-8db3-c8ea80541917@amd.com>
Date: Sun, 6 Apr 2025 14:23:19 +0800
From: Zhu Lingshan <lingshan.zhu@....com>
To: David Woodhouse <dwmw2@...radead.org>, "Michael S. Tsirkin"
 <mst@...hat.com>
Cc: virtio-comment@...ts.linux.dev, hch@...radead.org,
 Claire Chang <tientzu@...omium.org>,
 linux-devicetree <devicetree@...r.kernel.org>,
 Rob Herring <robh+dt@...nel.org>, Jörg Roedel
 <joro@...tes.org>, iommu@...ts.linux-foundation.org,
 linux-kernel@...r.kernel.org, graf@...zon.de
Subject: Re: [RFC PATCH 1/3] content: Add VIRTIO_F_SWIOTLB to negotiate use of
 SWIOTLB bounce buffers

On 4/3/2025 4:57 PM, David Woodhouse wrote:
> On Thu, 2025-04-03 at 16:34 +0800, Zhu Lingshan wrote:
>> On 4/3/2025 4:22 PM, David Woodhouse wrote:
>>> On Thu, 2025-04-03 at 04:13 -0400, Michael S. Tsirkin wrote:
>>>> On Thu, Apr 03, 2025 at 08:54:45AM +0100, David Woodhouse wrote:
>>>>> On Thu, 2025-04-03 at 03:34 -0400, Michael S. Tsirkin wrote:
>>>>>> Indeed I personally do not exactly get why implement a virtual system
>>>>>> without an IOMMU when virtio-iommu is available.
>>>>>>
>>>>>> I have a feeling it's about lack of windows drivers for virtio-iommu
>>>>>> at this point.
>>>>> And a pKVM (etc.) implementation of virtio-iommu which would allow the
>>>>> *trusted* part of the hypervisor to know which guest memory should be
>>>>> shared with the VMM implementing the virtio device models?
>>>> Is there a blocker here?
>>> Only the amount of complexity in what should be a minimal Trusted
>>> Compute Base. (And ideally subject to formal methods of proving its
>>> correctness too.)
>>>
>>> And frankly, if we were going to accept a virtio-iommu in the TCB why
>>> not just implement enough virtqueue knowledge to build something where
>>> the trusted part just snoops on the *actual* e.g. virtio-net device to
>>> know which buffers the VMM was *invited* to access, and facilitate
>>> that?
>> you trust CPU  and its IOMMU, and the virtio-iommu is provided by the hypervisor,
>> emulated by the CPU. If you don't trust virtio-iommu, then you should not trust
>> the bounce buffer, because it is unencrypted, more like a security leak.
>>
>> Actually everything is suspicious even the CPU, but you have to trust a TCB and
>> try to minimize the TCB. I remember there is an attestation mechanism to help
>> examine the infrastructure.  We need a balance and a tradeoff.
> In the pKVM model, we have a minimal trusted part of the hypervisor,
> which some are calling a "lowvisor", which enforces the property that
> even the rest of Linux/KVM and the VMM are not able to access arbitrary
> guest memory.
>
> For true PCI passthrough devices, hardware has a two-stage IOMMU which
> allows the guest to control which parts of its memory are accessible by
> the devices.
>
> The problem is those device models which are emulated in the VMM,
> because the VMM no longer has blanket access to the guest's memory.
>
> The simplest answer is just for the device models presented by the VMM
> to *not* do DMA access to guest system memory. Stick a bounce-buffer on
> the device itself, and do I/O through that.
>
> Yes, as you say, we have to trust the CPU and its IOMMU. And the
> microcode and low-level firmware, etc.
>
> But we do *not* trust most of Linux/KVM and the VMM. We only trust up
> to the pKVM "lowvisor" level. So if there is going to be a virtio-
> iommu, that's where it would have to be implemented. Otherwise, the VMM
> is just saying to the lowvisor, "please grant me access to <this> guest
> page because...erm... I said so". Which is not quite how the trust
> model is supposed to work.
>
> As noted in the original cover letter to this series, there are other
> options too. We could implement enlightenments in the guest to
> share/unshare pages, and corresponding dma_ops which simply invoke
> those hypercalls. That's a bit less complexity in the TCB, but does end
> up with a set of guest enlightenments which need to be supported in the
> core of every operating system.
>
> Compared with the option of a simple device driver for a device which
> conceptually doesn't do DMA at all.
Hello David,

I see you want to limit the device DMA range for security reasons,
a possible solution is not to do DMA at all, and provide a bounce
buffer from the device side. However, such a bounce buffer resides on the
device is unencrypted, means itself is insecure.

I have a rough proposal, first here are some assumptions:
1) we trust the guest because it belongs to the tenant, and we can not change any guest drivers, only the tenants upgrade guest kernel.
2) we only trust a minimal TCB, even don't trust most of the hypervisor code.
3) an emulated device is a part of the hypervisor.
4) even when the host / hypervisor compromised, we still want to secure the guest data.
5) virtio devices and virtio driver exchange data through the virtqueues, means virtio devices initiate DMA against queue buffers.

So here is a draft design:
1) queue buffer encryption, per queue or per device.
2) this feature is controlled by a feature bit.
3) driver initializes(writing to a device side specific registers / fields ) an encryption algorithm and keys before DRIVER_OK, can be null, means unencrypted.
The device side registers are WRITE ONLY, means can be overwritten in the worst case, but can not be hacked.
4) Write side(the producer, an example is the driver fills data in available queue buffers) encrypts, and the read side(the consumer) deciphers.
5) provide multiple encryption algorithm for different type of workloads and resource expenditure.

The pros:
1) a natural design, proven to work for many years.
2) even when host / hypervisor compromised, the DMA buffers are still encrypted and secure.
3) don't need to change any virtio-iommu implementation
4) compatible and actually orthogonal to CoCo, no conflictions.
5) this solution works for both emulated and real pass-through devices.
6）unlike TDX-IO, this is an implementation in virtio devices, so this design does not conflict with any existing use cases, supports live migration.

The cons:
1) need to change driver, but new features always need new implementations in the driver code.
2) consume more CPU cycles, but bounce buffers also need to cost more CPU resource, and CoCo attestation costs more CPU cycles.

So, does this low hanging fruit sounds good to you? This is just my brainstorming and a rough proposal.
Please let me know your thoughts.

Thanks
Zhu Lingshan
     
>
>