linux-kernel - Re: [RFC PATCH v1 00/38] ARM CCA Device Assignment support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <688ea45a14015_17ee100cf@dwillia2-mobl4.notmuch>
Date: Sat, 2 Aug 2025 16:50:50 -0700
From: <dan.j.williams@...el.com>
To: Jason Gunthorpe <jgg@...pe.ca>, <dan.j.williams@...el.com>
CC: "Aneesh Kumar K.V (Arm)" <aneesh.kumar@...nel.org>,
	<linux-coco@...ts.linux.dev>, <kvmarm@...ts.linux.dev>,
	<linux-pci@...r.kernel.org>, <linux-kernel@...r.kernel.org>, <aik@....com>,
	<lukas@...ner.de>, Samuel Ortiz <sameo@...osinc.com>, Xu Yilun
	<yilun.xu@...ux.intel.com>, Suzuki K Poulose <Suzuki.Poulose@....com>,
	"Steven Price" <steven.price@....com>, Catalin Marinas
	<catalin.marinas@....com>, "Marc Zyngier" <maz@...nel.org>, Will Deacon
	<will@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>
Subject: Re: [RFC PATCH v1 00/38] ARM CCA Device Assignment support

Jason Gunthorpe wrote:
> On Fri, Aug 01, 2025 at 02:19:54PM -0700, dan.j.williams@...el.com wrote:
> 
> > On the host this establishes an SPDM session and sets up link encryption
> > (IDE) with the physical device. Leave VMs out of the picture, this
> > capability in isolation is a useful property. It addresses the similar
> > threat model that Intel Total Memory Encryption (TME) or AMD Secure
> > Memory Encryption (SME) go after, i.e. interposer on a physical link
> > capturing data in flight. 
> 
> Okay, maybe connect is not an intuitive name for opening IDE
> sessions..

Part of the rationale for a generic name is the TSM is free to assert
that the link is secure without IDE. Think integrated devices where
there is no expectation the link can be observed.

The host and guest side TSM operations are split into link/transport
security and device/state security (private MMIO/DMA) concerns
respectively. So maybe "secure_link" would be a better name for this
host-side-only operation.

> > I started this project with "all existing T=0 drivers 'just work'" as a
> > goal and a virtue. I have been begrudgingly pulled away from it from the
> > slow drip of complexity it appears to push into the PCI core.
> 
> Do you have some examples? I don't really see what complexity there is
> if the solution it simply not auto bind any drivers to TDISP capable
> devices and userspace is responsible to manually bind a driver once it
> has reached T=1.

The example I have front of mind (confirmed by 2 vendors) is deferring
the loading of guest-side device/state security capable firmware to the
guest driver when the full device is assigned. In that scenario default
device power-on firmware is capable of link/transport security, enough
to get the device assigned. Guest needs to get the device/state security
firmware loaded before TDISP state transitions are possible.

I do think RAS recovery needs it too, but like you say below that should
come with conditions.

> This seems like the minimum possible simplicitly for the kernel as
> simply everything is managed by userspace, and there is really no
> special kernel behavior beyond switching the DMA API of an unbound
> driver on the T=0/1 change.
> 
> > The concern is neither userspace nor the PCI core have everything it
> > needs to get the device to T=1. 
> 
> Disagree, I think userspace can have everything. It may need some
> per-device userspace support in difficult cases, but userspace can
> deal with it..

I do think userspace can / must deal with it. Let me come back with
actual patches and a sample test case. I see a potential path to support
the above "prep" scenario without the mess of TDISP setup drivers, or
the ugly complexity of driver toggles or a usermodehelper.

> > PCI core knows that the device is T=1 capable, but does not know how
> > to preconfigure the device-specific lock state,
> 
> Userspace can do this. Can we define exactly what is needed to do this
> "pre-configure the device specific lock state"? At the very worst, for
> the most poorly designed device, userspace would have to bind a T=0
> driver and then unbind it.
> 
> Again, I am trying to make something simple for the kernel that gets
> us to a working solution before we jump ahead to far more complex in
> the kernel models, like aware drivers that can toggle themselves
> between T=0/1.

Agree. When I talked about wishing for the simple TDISP case that is
userspace can always "just lock" and "driver bind" without needing to
worry about "prep", i.e any "prep" is always implied by "lock". That
should be the baseline.

> > Userspace might be able to bind a new driver that leaves the device in a
> > lockable state on unbind, but that is not "just works" that is,
> 
> I wouldn't have the kernel leave the device in the locked state. That
> should always be userspace. The special driver may do whatever special
> setup is needed, then unbind and leave a normal unlocked device
> "prepped" for userspace locking without doing a FLR or
> something. Realistically I expect this to be a very rare requirement,
> I think this coming up just reflects the HW immaturity of some early
> TDISP devices.
> 
> Sensible mature devices should have no need of a pre-locking step. I
> think we should design toward that goal as the stable future and only
> try to enable a hacky work around for the problematic early devices. I
> certainly am not keen on seeing significant permanent kernel
> complexity to support this device design defect.

Yeah, that is the nightmare I had last night. I completed the thought
exercise about driver toggle and said, "whoops, nope, Jason is right, we
can't design for that without leaving a permanent mess to cleanup".
The end goal needs to look like straight line typical driver probe path
for TDISP capable devices.

> > driver that expects the device arrives already running. Also, that main
> > driver needs to be careful not to trigger typically benign actions like
> > touch the command register to trip the device into ERROR state, or any
> > device-specific actions that trip ERROR state but would otherwise be
> > benign outside of TDISP."
> 
> As I said below, I disagree with this. You can't touch the *physical*
> command register but the cVM can certainly touch the *virtualized*
> command register. It up to the VMM To ensure this doesn't cause the
> device to fall out of RUN as part of virtualization.
> 
> I'd also say that the VMM should be responsible to set pBME=1 even if
> vBME=0? Shouldn't it? That simplifies even more things for the guest.

True. Although, now I am going back on my PCI core burden concern to
wonder if *it* should handle a vBME on behalf of the driver if only
because it may want to force the device out of the RUN state on driver
unbind to meet typical pci_disable_device() expectations.

Alexey had this, I thought it was burdensome, now coming around.

> > > From that principal the kernel should NOT auto probe drivers to T=0
> > > devices that can be made T=1. Userspace should handle attaching HW to
> > > such devices, and userspace can sequence whatever is required,
> > > including the attestation and verifying.
> > 
> > Agree, for PCI it would be simple to set a no-auto-probe policy for T=1
> > capable devices.
> 
> So then it is just a question of what does a userspace component need
> to do.
> 
> > I do not want to burden the PCI core with TDISP compatibility hacks and
> > workarounds if it turns out only a small handful of devices ever deploy
> > a first generation TDISP Device Security Manager (DSM). L1 aiding L2, or
> > TDISP simplicity improvements to allow the PCI core to handle this in a
> > non-broken way, are what I expect if secure device assignment takes off.
> 
> Same feeling about pre-configuration :)
> 
> > > The starting point must have the core code do this sequence
> > > for every driver. Once that is working we can talk about if other
> > > flows are needed.
> > 
> > Do you agree that "device-specific-prep+lock" is the problem to solve?
> 
> Not "the" problem, but an design issue we need to accommodate but not
> endorse.

I hear you, let me walk back from the cliff with patches.

> 
> > > But I think we can start with the idea that such RAS failures have to
> > > reload the driver too and work on improvements. Realistically few
> > > drivers have the sort of RAS features to consume this anyhow and maybe
> > > we introduce some "enhanced" driver mode to opt-into down the road.
> > 
> > Hmm, having trouble not reading that back supporting my argument above:
> > 
> > Realistically few devices support TDISP lets require enhanced drivers to
> > opt-into TDISP for the time being.
> 
> I would be comfortable if hitless RAS recovery for TDISP devices
> requires some kernel opt-in. But also I'm not sure how this should
> work from a security perspective. Should userspace also have to
> re-attest before allowing back to RUN? Clearly this is complicated.
> 
> Also, I would be comfortable to support this only for devices that do
> not require pre-configuration.

That seems reasonable. You want hitless RAS? Give us hitless init.