linux-kernel - Re: [RFC PATCH v1 00/38] ARM CCA Device Assignment support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <688c2155849a2_cff99100dd@dwillia2-xfh.jf.intel.com.notmuch>
Date: Thu, 31 Jul 2025 19:07:17 -0700
From: <dan.j.williams@...el.com>
To: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@...nel.org>,
	<linux-coco@...ts.linux.dev>, <kvmarm@...ts.linux.dev>
CC: <linux-pci@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<aik@....com>, <lukas@...ner.de>, Samuel Ortiz <sameo@...osinc.com>, Xu Yilun
	<yilun.xu@...ux.intel.com>, Jason Gunthorpe <jgg@...pe.ca>, Suzuki K Poulose
	<Suzuki.Poulose@....com>, Steven Price <steven.price@....com>, "Catalin
 Marinas" <catalin.marinas@....com>, Marc Zyngier <maz@...nel.org>, Will
 Deacon <will@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>, "Aneesh
 Kumar K.V (Arm)" <aneesh.kumar@...nel.org>
Subject: Re: [RFC PATCH v1 00/38] ARM CCA Device Assignment support

Aneesh Kumar K.V (Arm) wrote:
> This patch series implements support for Device Assignment in the ARM CCA
> architecture. The code changes are based on Alp12 specification published here
> [1].
> 
> The code builds on the TSM framework patches posted at [2]. We add extension to
> that framework so that TSM is now used in both the host and the guest.
> 
> A DA workflow can be summarized as below:
> 
> Host:
> step 1.
> echo ${DEVICE} > /sys/bus/pci/devices/${DEVICE}/driver/unbind
> echo vfio-pci > /sys/bus/pci/devices/${DEVICE}/driver_override
> echo ${DEVICE} > /sys/bus/pci/drivers_probe
> 
> step 2.
> echo 1 > /sys/bus/pci/devices/$DEVICE/tsm/connect

Just for my own understanding... presumably there is no ordering
constraint for ARM CCA between step1 and step2, right? I.e. The connect
state is independent of the bind state.

In the v4 PCI/TSM scheme the connect command is now:

echo $tsm_dev > /sys/bus/pci/devices/$DEVICE/tsm/connect

> Now in the guest we follow the below steps

I assume a signifcant amount of kvmtool magic happens here to get the
TDI into a "bind capable" state, can you share that command?

I had been assuming that everyone was prototyping with QEMU. Not a
problem per se, but the memory management for shared device assignment /
bounce buffering has had a quite of bit of work on the QEMU side, so
just curious about the difference in approach here. Like, does kvmtool
support operating the device in shared mode with bounce buffering and
page conversion (shared <=> private) support? In any event, happy to see
mutiple simultaneous consumers of this new kernel infrastructure.

> step 1:
> echo ${DEVICE} > /sys/bus/pci/devices/${DEVICE}/driver/unbind
> 
> step 2: Move the device to TDISP LOCK state
> echo 1 > /sys/bus/pci/devices/${DEVICE}/tsm/lock

Ok, so my stance has recently picked up some nuance here. As Jason
mentions here:

http://lore.kernel.org/20250410235008.GC63245@ziepe.ca

"However it works, it should be done before the driver is probed and
remain stable for the duration of the driver attachment. From the
iommu side the correct iommu domain, on the correct IOMMU instance to
handle the expected traffic should be setup as the DMA API's iommu
domain."

I agree with that up until the point where the implication is userspace
control of the UNLOCKED->LOCKED transition. That transition requires
enabling bus-mastering (BME), configuring the device into an expected
state, and *then* locking the device. That means userspace is blindly
hoping that the device is in a state where it will remain quiet on the
bus between BME and LOCKED, and that the previous unbind left the device
in a state where it is prepared to be locked again.

The BME concern may be overblown given major PCI drivers blindly set BME
without validating the device is in a quiesced state, but the "device is
prepped for locking" problem seems harder.

2 potential ways to solve this, but open to other ideas:

- Userspace only picks the iommu domain context for the device not the
  lock state. Something like:

  private > /sys/bus/pci/devices/${DEVICE}/tsm/domain

  ...where the default is "shared" and from that point the device can
  not issue DMA until a driver attaches.  Driver controls
  UNLOCKED->LOCKED->RUN.

- Userspace is not involved in this transition and the dma mapping API
  is updated to allow a driver to switch the iommu domain at runtime,
  but only if the device has no outstanding mappings and the transition
  can only happen from ->probe() context. Driver controls joining
  secure-world-DMA and UNLOCKED->LOCKED->RUN.

Clearly the first option is less work in the kernel, but in both options
the driver is in control of when BME is set relative to being ready for
the LOCKED transition.
  
> step 3: Moves the device to TDISP RUN state
> echo 1 > /sys/bus/pci/devices/${DEVICE}/tsm/accept

This has the same concern from me about userspace being in control of
BME. It feels like a departure from typical expectations.  At least in
the case of a driver setting BME the driver's probe routine is going to
get the device in order shortly and otherwise have error handlers at the
ready to effect any needed recovery.

Userspace just leaves the device enabled indefinitely and hopes.

Now, the nice thing about the scheme as proposed in this set is that
userspace has all the time in the world between "lock" and "accept" to
talk to a verifier.

With the driver in control there would need to be something like a
usermodehelper to notify userspace that the device is in the locked
state and to go ahead and run the attestation while the driver waits*.

* or driver could decide to not wait, especially useful for debug and
  development

> step 4: Load the driver again.
> echo ${DEVICE} > /sys/bus/pci/drivers_probe

TIL drivers_probe

Maybe want to recommend:

echo ${DEVICE} > /sys/bus/pci/drivers/${DRIVER}/bind

...to users just in case there are multiple drivers loaded for the
device for the "shared" vs "private" case?

> I'm currently working against TSM v3, as TSM v4 lacks the necessary
> callbacks—bind, unbind, and guest_req—required for guest interactions.

For staging purposes I wanted to put the "connect" flow to bed before
moving on to the guest side.

> The implementation also makes use of RHI interfaces that fall outside the
> current RHI specification [5]. Once the spec is finalized, the code will be aligned
> accordingly.
> 
> For now, I’ve retained validate_mmio and vdev_req exit handling within KVM. This
> will transition to a guest_req-based mechanism once the specification is
> updated.
> 
> At that point, all device assignment (DA)-specific VM exits will exit directly
> to the VMM, and will use the guest_req ioctl to handle exit reasons. As part of
> this change, the handlers realm_exit_vdev_req_handler,
> realm_exit_vdev_comm_handler, and realm_exit_dev_mem_map_handler will be
> removed.
> 
> Full patchset for the kernel and kvmtool can be found at [3] and [4]
> 
> [1] https://developer.arm.com/-/cdn-downloads/permalink/Architectures/Armv9/DEN0137_1.1-alp12.zip
> 
> [2] https://lore.kernel.org/all/20250516054732.2055093-1-dan.j.williams@intel.com
> 
> [3] https://git.gitlab.arm.com/linux-arm/linux-cca.git cca/tdisp-upstream-post-v1
> [4] https://git.gitlab.arm.com/linux-arm/kvmtool-cca.git cca/tdisp-upstream-post-v1
> [5] https://developer.arm.com/documentation/den0148/latest/

Thanks for this and the help reviewing PCI/TSM so far! I want to get
this into tsm.git#staging so we can start to make hard claims ("look at
the shared tree!") of hardware vendor consensus.