linux-kernel - Re: [RFC PATCH 00/12] Private MMIO support for private assigned dev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c3fbf6ac-92b9-4a14-9505-ab9e8f30b06b@linux.intel.com>
Date: Tue, 10 Jun 2025 13:19:29 +0800
From: Baolu Lu <baolu.lu@...ux.intel.com>
To: Alexey Kardashevskiy <aik@....com>, Xu Yilun <yilun.xu@...ux.intel.com>
Cc: Jason Gunthorpe <jgg@...dia.com>, kvm@...r.kernel.org,
 dri-devel@...ts.freedesktop.org, linux-media@...r.kernel.org,
 linaro-mm-sig@...ts.linaro.org, sumit.semwal@...aro.org,
 christian.koenig@....com, pbonzini@...hat.com, seanjc@...gle.com,
 alex.williamson@...hat.com, vivek.kasireddy@...el.com,
 dan.j.williams@...el.com, yilun.xu@...el.com, linux-coco@...ts.linux.dev,
 linux-kernel@...r.kernel.org, lukas@...ner.de, yan.y.zhao@...el.com,
 daniel.vetter@...ll.ch, leon@...nel.org, zhenzhong.duan@...el.com,
 tao1.su@...el.com
Subject: Re: [RFC PATCH 00/12] Private MMIO support for private assigned dev

On 6/10/25 12:20, Alexey Kardashevskiy wrote:
> 
> 
> On 31/5/25 02:23, Xu Yilun wrote:
>> On Fri, May 30, 2025 at 12:29:30PM +1000, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 30/5/25 00:41, Xu Yilun wrote:
>>>>>>>>
>>>>>>>> FLR to a bound device is absolutely fine, just break the CC state.
>>>>>>>> Sometimes it is exactly what host need to stop CC immediately.
>>>>>>>> The problem is in VFIO's pre-FLR handling so we need to patch 
>>>>>>>> VFIO, not
>>>>>>>> PCI core.
>>>>>>>
>>>>>>> What is a problem here exactly?
>>>>>>> FLR by the host which equals to any other PCI error? The guest 
>>>>>>> may or may not be able to handle it, afaik it does not handle any 
>>>>>>> errors now, QEMU just stops the guest.
>>>>>>
>>>>>> It is about TDX Connect.
>>>>>>
>>>>>> According to the dmabuf patchset, the dmabuf needs to be revoked 
>>>>>> before
>>>>>> FLR. That means KVM unmaps MMIOs when the device is in LOCKED/RUN 
>>>>>> state.
>>>>>> That is forbidden by TDX Module and will crash KVM.
>>>>>
>>>>>
>>>>> FLR is something you tell the device to do, how/why would TDX know 
>>>>> about it?
>>>>
>>>> I'm talking about FLR in VFIO driver. The VFIO driver would zap bar
>>>> before FLR. The zapping would trigger KVM unmap MMIOs. See
>>>> vfio_pci_zap_bars() for legacy case, and see [1] for dmabuf case.
>>>
>>> oh I did not know that we do this zapping, thanks for the pointer.
>>>> [1] https://lore.kernel.org/kvm/20250307052248.405803-4- 
>>>> vivek.kasireddy@...el.com/
>>>>
>>>> A pure FLR without zapping bar is absolutely OK.
>>>>
>>>>> Or it check the TDI state on every map/unmap (unlikely)?
>>>>
>>>> Yeah, TDX Module would check TDI state on every unmapping.
>>>
>>> _every_? Reading the state from DOE mailbox is not cheap enough 
>>> (imho) to do on every unmap.
>>
>> Sorry for confusing. TDX firmware just checks if STOP TDI firmware call
>> is executed, will not check the real device state via DOE. That means
>> even if device has physically exited to UNLOCKED, TDX host should still
>> call STOP TDI fwcall first, then MMIO unmap.
>>
>>>
>>>>>
>>>>>> So the safer way is
>>>>>> to unbind the TDI first, then revoke MMIOs, then do FLR.
>>>>>>
>>>>>> I'm not sure when p2p dma is involved AMD will have the same issue.
>>>>>
>>>>> On AMD, the host can "revoke" at any time, at worst it'll see RMP 
>>>>> events from IOMMU. Thanks,
>>>>
>>>> Is the RMP event firstly detected by host or guest? If by host,
>>>
>>> Host.
>>>
>>>> host could fool guest by just suppress the event. Guest thought the
>>>> DMA writting is successful but it is not and may cause security issue.
>>>
>>> An RMP event on the host is an indication that RMP check has failed 
>>> and DMA to the guest did not complete so the guest won't see new 
>>> data. Same as other PCI errors really. RMP acts like a firewall, 
>>> things behind it do not need to know if something was dropped. Thanks,
>>
>> Not really, guest thought the data is changed but it actually doesn't.
>> I.e. data integrity is broken.
> 
> I am not following, sorry. Integrity is broken when something untrusted 
> (== other than the SNP guest and the trusted device) manages to write to 
> the guest encrypted memory successfully. If nothing is written - the 
> guest can easily see this and do... nothing? Devices have bugs or 
> spurious interrupts happen, the guest driver should be able to cope with 
> that.

Data integrity might not be the most accurate way to describe the
situation here. If I understand correctly, the MMIO mapping was
destroyed before the device was unbound (meaning the guest still sees
the device). When the guest issues a P2P write to the device's MMIO, it
will definitely fail, but the guest won't be aware of this failure.

Imagine this on a bare-metal system: if a P2P access targets a device's
MMIO but the device or platform considers it an illegal access, there
should be a bus error or machine check exception. Alternatively, if the
device supports out-of-band AER, the AER driver should then catch and
process these errors.

Therefore, unbinding the device before MMIO invalidation could generally
avoid this.

Thanks,
baolu