linux-kernel - Re: [RFC PATCH v1 00/38] ARM CCA Device Assignment support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250802141750.GL26511@ziepe.ca>
Date: Sat, 2 Aug 2025 11:17:50 -0300
From: Jason Gunthorpe <jgg@...pe.ca>
To: dan.j.williams@...el.com
Cc: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@...nel.org>,
	linux-coco@...ts.linux.dev, kvmarm@...ts.linux.dev,
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org,
	aik@....com, lukas@...ner.de, Samuel Ortiz <sameo@...osinc.com>,
	Xu Yilun <yilun.xu@...ux.intel.com>,
	Suzuki K Poulose <Suzuki.Poulose@....com>,
	Steven Price <steven.price@....com>,
	Catalin Marinas <catalin.marinas@....com>,
	Marc Zyngier <maz@...nel.org>, Will Deacon <will@...nel.org>,
	Oliver Upton <oliver.upton@...ux.dev>
Subject: Re: [RFC PATCH v1 00/38] ARM CCA Device Assignment support

On Fri, Aug 01, 2025 at 02:19:54PM -0700, dan.j.williams@...el.com wrote:

> On the host this establishes an SPDM session and sets up link encryption
> (IDE) with the physical device. Leave VMs out of the picture, this
> capability in isolation is a useful property. It addresses the similar
> threat model that Intel Total Memory Encryption (TME) or AMD Secure
> Memory Encryption (SME) go after, i.e. interposer on a physical link
> capturing data in flight. 

Okay, maybe connect is not an intuitive name for opening IDE
sessions..

> I started this project with "all existing T=0 drivers 'just work'" as a
> goal and a virtue. I have been begrudgingly pulled away from it from the
> slow drip of complexity it appears to push into the PCI core.

Do you have some examples? I don't really see what complexity there is
if the solution it simply not auto bind any drivers to TDISP capable
devices and userspace is responsible to manually bind a driver once it
has reached T=1.

This seems like the minimum possible simplicitly for the kernel as
simply everything is managed by userspace, and there is really no
special kernel behavior beyond switching the DMA API of an unbound
driver on the T=0/1 change.

> The concern is neither userspace nor the PCI core have everything it
> needs to get the device to T=1. 

Disagree, I think userspace can have everything. It may need some
per-device userspace support in difficult cases, but userspace can
deal with it..

> PCI core knows that the device is T=1 capable, but does not know how
> to preconfigure the device-specific lock state,

Userspace can do this. Can we define exactly what is needed to do this
"pre-configure the device specific lock state"? At the very worst, for
the most poorly designed device, userspace would have to bind a T=0
driver and then unbind it.

Again, I am trying to make something simple for the kernel that gets
us to a working solution before we jump ahead to far more complex in
the kernel models, like aware drivers that can toggle themselves
between T=0/1.

> Userspace might be able to bind a new driver that leaves the device in a
> lockable state on unbind, but that is not "just works" that is,

I wouldn't have the kernel leave the device in the locked state. That
should always be userspace. The special driver may do whatever special
setup is needed, then unbind and leave a normal unlocked device
"prepped" for userspace locking without doing a FLR or
something. Realistically I expect this to be a very rare requirement,
I think this coming up just reflects the HW immaturity of some early
TDISP devices.

Sensible mature devices should have no need of a pre-locking step. I
think we should design toward that goal as the stable future and only
try to enable a hacky work around for the problematic early devices. I
certainly am not keen on seeing significant permanent kernel
complexity to support this device design defect.

> driver that expects the device arrives already running. Also, that main
> driver needs to be careful not to trigger typically benign actions like
> touch the command register to trip the device into ERROR state, or any
> device-specific actions that trip ERROR state but would otherwise be
> benign outside of TDISP."

As I said below, I disagree with this. You can't touch the *physical*
command register but the cVM can certainly touch the *virtualized*
command register. It up to the VMM To ensure this doesn't cause the
device to fall out of RUN as part of virtualization.

I'd also say that the VMM should be responsible to set pBME=1 even if
vBME=0? Shouldn't it? That simplifies even more things for the guest.

> > From that principal the kernel should NOT auto probe drivers to T=0
> > devices that can be made T=1. Userspace should handle attaching HW to
> > such devices, and userspace can sequence whatever is required,
> > including the attestation and verifying.
> 
> Agree, for PCI it would be simple to set a no-auto-probe policy for T=1
> capable devices.

So then it is just a question of what does a userspace component need
to do.

> I do not want to burden the PCI core with TDISP compatibility hacks and
> workarounds if it turns out only a small handful of devices ever deploy
> a first generation TDISP Device Security Manager (DSM). L1 aiding L2, or
> TDISP simplicity improvements to allow the PCI core to handle this in a
> non-broken way, are what I expect if secure device assignment takes off.

Same feeling about pre-configuration :)

> > The starting point must have the core code do this sequence
> > for every driver. Once that is working we can talk about if other
> > flows are needed.
> 
> Do you agree that "device-specific-prep+lock" is the problem to solve?

Not "the" problem, but an design issue we need to accommodate but not
endorse.

> > But I think we can start with the idea that such RAS failures have to
> > reload the driver too and work on improvements. Realistically few
> > drivers have the sort of RAS features to consume this anyhow and maybe
> > we introduce some "enhanced" driver mode to opt-into down the road.
> 
> Hmm, having trouble not reading that back supporting my argument above:
> 
> Realistically few devices support TDISP lets require enhanced drivers to
> opt-into TDISP for the time being.

I would be comfortable if hitless RAS recovery for TDISP devices
requires some kernel opt-in. But also I'm not sure how this should
work from a security perspective. Should userspace also have to
re-attest before allowing back to RUN? Clearly this is complicated.

Also, I would be comfortable to support this only for devices that do
not require pre-configuration.

Jason