linux-kernel - Re: [RFC v2 09/20] PCI/CMA: Expose in sysfs whether devices are authenticated

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKmqyKMHMo7_EeZ1fvSKKE0i1nm1QS5x4PYJBe1Yx8r4nqwpgA@mail.gmail.com>
Date: Fri, 28 Feb 2025 12:55:30 +1000
From: Alistair Francis <alistair23@...il.com>
To: Greg KH <gregkh@...uxfoundation.org>
Cc: Lukas Wunner <lukas@...ner.de>, Alistair Francis <alistair@...stair23.me>, linux-cxl@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-pci@...r.kernel.org, bhelgaas@...gle.com, 
	Jonathan.Cameron@...wei.com, rust-for-linux@...r.kernel.org, 
	akpm@...ux-foundation.org, boqun.feng@...il.com, bjorn3_gh@...tonmail.com, 
	wilfred.mallawa@....com, aliceryhl@...gle.com, ojeda@...nel.org, 
	a.hindborg@...nel.org, tmgross@...ch.edu, gary@...yguo.net, 
	alex.gaynor@...il.com, benno.lossin@...ton.me, 
	Alistair Francis <alistair.francis@....com>
Subject: Re: [RFC v2 09/20] PCI/CMA: Expose in sysfs whether devices are authenticated

On Fri, Feb 28, 2025 at 11:41 AM Greg KH <gregkh@...uxfoundation.org> wrote:
>
> On Thu, Feb 27, 2025 at 11:42:46PM +0100, Lukas Wunner wrote:
> > On Thu, Feb 27, 2025 at 03:16:40AM -0800, Greg KH wrote:
> > > On Thu, Feb 27, 2025 at 01:09:41PM +1000, Alistair Francis wrote:
> > > > The PCI core has just been amended to authenticate CMA-capable devices
> > > > on enumeration and store the result in an "authenticated" bit in struct
> > > > pci_dev->spdm_state.
> > > >
> > > > Expose the bit to user space through an eponymous sysfs attribute.
> > > >
> > > > Allow user space to trigger reauthentication (e.g. after it has updated
> > > > the CMA keyring) by writing to the sysfs attribute.
> > > >
> > > > Implement the attribute in the SPDM library so that other bus types
> > > > besides PCI may take advantage of it.  They just need to add
> > > > spdm_attr_group to the attribute groups of their devices and amend the
> > > > dev_to_spdm_state() helper which retrieves the spdm_state for a given
> > > > device.
> > > >
> > > > The helper may return an ERR_PTR if it couldn't be determined whether
> > > > SPDM is supported by the device.  The sysfs attribute is visible in that
> > > > case but returns an error on access.  This prevents downgrade attacks
> > > > where an attacker disturbs memory allocation or DOE communication
> > > > in order to create the appearance that SPDM is unsupported.
> > >
> > > I don't like this "if it's present we still don't know if the device
> > > supports this", as that is not normally the "sysfs way" here.  Why must
> > > it be present in those situations?

I do think there are 4 situations

 1. Device supports authentication and was authenticated
 2.
  a) Device supports authentication, but the certificate chain can't
be authenticated
  b) Device supports authentication, but the communication was interrupted
 3. Device doesn't support authentication

Case 1 and 3 are easy

Case 2 (a) means that the kernel doesn't trust the certificate. This
could be for a range of reasons, including the user just hasn't
installed the cert yet. So it makes sense to pass that information to
the user as they can act on it. I think it's a different case to 3
(where there is no action for the user to take).

> >
> > That's explained above.
>
> Not really, you just say "downgrade attacks", which is not something
> that we need to worry about, right?  If so, I think this bit is the
> least of our worries.
>
> > Unfortunately there is no (signed) bit in Config Space which tells us
> > whether authentication is supported by a PCI device.  Rather, it is
> > necessary to exchange several messages with the device through a
> > DOE mailbox in config space to determine that.  I'm worried that an
> > attacker deliberately "glitches" those DOE exchanges and thus creates
> > the appearance that the device does not support authentication.
>
> That's a hardware glitch, and if that happens, then it will show a 0 and
> that's the same as not being present at all, right?  Otherwise you just
> pound on the file to try to see if the glitch was not real?  That's not
> going to go over well.
>
> > Let's say the user's policy is to trust legacy devices which do not
> > support authentication, but require authentication for newer NVMe drives
> > from a certain vendor.  An attacker may manipulate an authentication-capable
> > NVMe drive from that vendor, whereupon it will fail authentication.
> > But the attacker can trick the user into trusting the device by glitching
> > the DOE exchanges.
>
> Again, are we now claiming that Linux needs to support "hardware
> glitching"?  Is that required somewhere?  I think if the DOE exchanges
> fail, we just trust the device as we have to trust something, right?

This is case 2 (b) above. If an attacker could flip a bit or drop a
byte in the communication path they could cause the authentication to
stop.

The process to authenticate a device involves sending a lot of data,
so it's not impossible that an attacker could interrupt some of the
traffic.

If the DOE exchange fails I don't think we want to trust the device,
as that's the point of SPDM. So being able to communicate that to
userspace does seem really useful.

>
> > Of course, this is an abnormal situation that users won't encounter
> > unless they're being attacked.  Normally the attribute is only present
> > if authentication is supported.
> >
> > I disagree with your assessment that we have bigger problems.
> > For security protocols like this we have to be very careful
> > to prevent trivial circumvention.  We cannot just shrug this off
> > as unimportant.
>
> hardware glitching is not trivial.  Let's only worry about that if the
> hardware people somehow require it, and if so, we can push back and say
> "stop making us fix your broken designs" :)

Hardware glitching is hard to do, but SPDM is designed to protect
against a range of hard attacks. The types of attacks that nation
states would do. So I think we should do our best to protect against
them.Obviously there is only so much that software can do to protect
against a hardware glitch, but this seems like a good in between.

I don't think the design is overall broken though. At some point when
all devices support SPDM it becomes a non issue as you just fail if
authentication fails, but in the mean time we need to handle devices
with and without authentication.

>
> > The "authenticated" attribute tells user space whether the device
> > is authenticated.  User space needs to handle errors anyway when
> > reading the attribute.  Users will get an error if authentication
> > support could not be determined.  Simple.
>
> No, if it's not determined, it shouldn't be present.

That doesn't allow us to differentiate the cases I mentioned above though.

Another option is we could create multiple attributes "spdm",
"authenticated" and "spdm_err" for example to convey the information
with just yes/no attributes?

>
> > > What is going to happen to suddenly
> > > allow it to come back to be "alive" and working while the device is
> > > still present in the system?
> >
> > The device needs to be re-enumerated by the PCI core to retry
> > determining its authentication capability.  That's why the
> > sysfs documentation says the user may exercise the "remove"
> > and "rescan" attributes to retry authentication.
>
> But how does it know that?  remove and recan is a huge sledgehammer, and
> an amazing one if it even works on most hardware.  Don't make it part of
> any normal process please.
>
> > > I'd prefer it to be "if the file is there, this is supported by the
> > > device.  If the file is not there, it is not supported", as that makes
> > > things so much simpler for userspace (i.e. you don't want userspace to
> > > have to both test if the file is there AND read all entries just to see
> > > if the kernel knows what is going on or not.)
> >
> > Huh?  Read all entries?  The attribute contains only 0 or 1!
> >
> > Or you'll get an error reading it.
>
> It's the error, don't do that.  If an error is going to happen, then
> don't have the file there.  That's the way sysfs works, it's not a
> "let's add all possible files and then make userspace open them all and
> see if an error happens to determine what really is present for this
> device" model.  It's a "if a file is there, that attribute is there and
> we can read it".
>
> > > > Alternatively, authentication success might be signaled to user space
> > > > through a uevent, whereupon it may bind a (blacklisted) driver.
> > >
> > > How will that happen?
> >
> > The SPDM library can be amended to signal a uevent when authentication
> > succeeds or fails and user space can then act on it.  I imagine systemd
> > or some other daemon might listen to such events and do interesting things,
> > such as binding a driver once authentication succeeds.
>
> That's a new user/kernel api and should be designed ONLY if you actually
> need it and have a user.  Otherwise let's just wait for later for that.
>
> > Maybe you have better ideas.  Be constructive!  Make suggestions!
>
> Again, have the file there only if this is something that the hardware
> supports.  Don't fail a read just because the hardware does not seem to
> support it, but it might sometime in the future if you just happen to
> unplug/plug it back in.
>
> > > > +         This prevents downgrade attacks where an attacker consumes
> > > > +         memory or disturbs communication in order to create the
> > > > +         appearance that a device does not support authentication.
> > >
> > > If an attacker can consume kernel memory to cause this to happen you
> > > have bigger problems.  That's not the kernel's issue here at all.
> > >
> > > And "disable communication" means "we just don't support it as the
> > > device doesn't say it does", so again, why does that matter?
> >
> > Reacting to potential attacks sure is the kernel's business.
>
> Reacting to real, software attacks is the kernel's business.  Reacting
> to possible hardware issues that are just theoretical is not.

Generally I would agree, but the idea of SPDM is to react to hardware
attacks, so it's something that the kernel should be conscious of

>
> > > > +         The reason why authentication support could not be determined
> > > > +         is apparent from "dmesg".  To re-probe authentication support
> > > > +         of PCI devices, exercise the "remove" and "rescan" attributes.
> > >
> > > Don't make userspace parse kernel logs for this type of thing.  And
> > > asking userspace to rely on remove and recan is a mess, either show it
> > > works or not.
> >
> > I'd say looking in dmesg to determine whether the user is being attacked
> > is perfectly fine, as is requiring the user to explicitly act on a
> > potential attack.
> >
> >
> > > > --- a/drivers/pci/cma.c
> > > > +++ b/drivers/pci/cma.c
> > > > @@ -171,8 +171,10 @@ void pci_cma_init(struct pci_dev *pdev)
> > > >  {
> > > >   struct pci_doe_mb *doe;
> > > >
> > > > - if (IS_ERR(pci_cma_keyring))
> > > > + if (IS_ERR(pci_cma_keyring)) {
> > > > +         pdev->spdm_state = ERR_PTR(-ENOTTY);
> > >
> > > Why ENOTTY?
> >
> > We use -ENOTTY as return value for unsupported reset methods in the
> > PCI core, see e.g. pcie_reset_flr(), pcie_af_flr(), pci_pm_reset(),
> > pci_parent_bus_reset(), pci_reset_hotplug_slot(), ...
> >
> > We also use -ENOTTY in pci_bridge_wait_for_secondary_bus() and
> > pci_dev_wait().
> >
> > It was used here to be consistent with those existing occurrences
> > in the PCI core.
> >
> > If you'd prefer something else, please make a suggestion.
>
> Ah, didn't realize that was a pci thing, ok, nevermind.
>
> > > > +static ssize_t authenticated_store(struct device *dev,
> > > > +                            struct device_attribute *attr,
> > > > +                            const char *buf, size_t count)
> > > > +{
> > > > + void *spdm_state = dev_to_spdm_state(dev);
> > > > + int rc;
> > > > +
> > > > + if (IS_ERR_OR_NULL(spdm_state))
> > > > +         return PTR_ERR(spdm_state);
> > > > +
> > > > + if (sysfs_streq(buf, "re")) {
> > >
> > > I don't like sysfs files that when reading show a binary, but require a
> > > "magic string" to be written to them to have them do something else.
> > > that way lies madness.  What would you do if each sysfs file had a
> > > custom language that you had to look up for each one?  Be consistant
> > > here.  But again, I don't think you need a store function at all, either
> > > the device supports this, or it doesn't.
> >
> > I'm not sure if you've even read the ABI documentation in full.
> >
> > The store method is needed to reauthenticate the device,
> > e.g. after a new trusted root certificate was added to the
> > kernel's .cma keyring.
>
> Why not have a different file called "reauthentication" that only allows
> a write to it of a 1/Y/y to do the reauthentication.  sysfs is a "one
> file per thing" interface, not a "parse a command and do something but
> when read from return a different value" interface.

That works for me as well

Alistair

>
> Let's keep it dirt simple please.
>
> thanks,
>
> greg k-h