linux-kernel - Re: [PATCH v14 18/19] x86: Secure Launch late initcall platform module

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrVfrP=RL0W1cOY1PXGAsVLgbgSVLCy+ZsDg=-rxMQ=u9w@mail.gmail.com>
Date: Wed, 30 Apr 2025 11:51:54 -0700
From: Andy Lutomirski <luto@...nel.org>
To: "Daniel P. Smith" <dpsmith@...rtussolutions.com>
Cc: Ross Philipson <ross.philipson@...cle.com>, linux-kernel@...r.kernel.org, x86@...nel.org, 
	linux-integrity@...r.kernel.org, linux-doc@...r.kernel.org, 
	linux-crypto@...r.kernel.org, kexec@...ts.infradead.org, 
	linux-efi@...r.kernel.org, iommu@...ts.linux.dev, tglx@...utronix.de, 
	mingo@...hat.com, bp@...en8.de, hpa@...or.com, dave.hansen@...ux.intel.com, 
	ardb@...nel.org, mjg59@...f.ucam.org, James.Bottomley@...senpartnership.com, 
	peterhuewe@....de, jarkko@...nel.org, jgg@...pe.ca, nivedita@...m.mit.edu, 
	herbert@...dor.apana.org.au, davem@...emloft.net, corbet@....net, 
	ebiederm@...ssion.com, dwmw2@...radead.org, baolu.lu@...ux.intel.com, 
	kanth.ghatraju@...cle.com, andrew.cooper3@...rix.com, 
	trenchboot-devel@...glegroups.com
Subject: Re: [PATCH v14 18/19] x86: Secure Launch late initcall platform module

On Tue, Apr 29, 2025 at 6:41 PM Daniel P. Smith
<dpsmith@...rtussolutions.com> wrote:
>
> On 4/28/25 13:38, Andy Lutomirski wrote:
> >> On Apr 21, 2025, at 9:36 AM, Ross Philipson <ross.philipson@...cle.com> wrote:
> >>
> >> From: "Daniel P. Smith" <dpsmith@...rtussolutions.com>
> >>
> >> The Secure Launch platform module is a late init module. During the
> >> init call, the TPM event log is read and measurements taken in the
> >> early boot stub code are located. These measurements are extended
> >> into the TPM PCRs using the mainline TPM kernel driver.
> >
> > I read through some of the TPM and TXT docs, and I haven’t found a
> > clear explanation of exactly what gets hashed into which PCR.  (Mostly
> > because the docs are full of TXT-specific terms.)
>
>
> For Intel TXT, the general approach is detailed in section 1.10.2 of the
> TXT Software Development Guide[1]. I point you at the Detail and
> Authorities Usage section because the ability to do Legacy Usage has
> been unavailable for some time.
>
> In section 1.10.2.1, the dialogue explains how and what the initial
> measurement is into PCR17. After that is Table 1, which provides a
> listing of all the measurements the ACM could make before starting the
> MLE. Just as an FYI, on Gen 9 and later, the STM measurement will be
> present and will be the hash of the PPAM module.[2]
>
> Section 1.10.2.2 gives a similar treatment to PCR18.
>
> [1]
> https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
> [2]
> https://www.intel.com/content/dam/www/central-libraries/us/en/documents/drtm-based-computing-whitepaper.pdf
>
>
> > But I’m really struggling to understand how the security model ends up
> > being consistent with this late_initcall thing. We measure some state
> > into the event log, and then we do a whole bunch of things (everything
> > from the very beginning of loading the kernel proper to the whenever
> > in the late_initcall stage this code runs), and then we actually
> > extend the PCRs.  It seems to me that this may involve a whole lot of
> > crossing fingers that an attacker can’t find a way to get the kernel
> > to execute code that changes the event log in memory prior to
> > extending PCRs such that attacker-controlled values get written.  Even
> > if the design is, in principle, sound, the attack surface seems much,
> > much larger than it deserves to be.

I hate to be obnoxious, but your email kind of exemplifies why I, and
I think many developers, REALLY dislike the TXT and related specs.
It's full of magic words that mean nothing to anyone not immersed in
this particular ecosystem.

>
>
> There is a more fundemental flaw to your scenario, but before covering
> that, consider what measurements could be tampered with that are made by
> the setup kernel:

What is the "setup kernel"?  Do you mean the early code in the kernel?

>
>   - Kernel Setup Data
>   - TrenchBoot's SLRT
>   - Boot Params
>   - Command line
>   - EFI Memory Map
>   - EFI configuration items, populated by efi-stub (currently unused)
>   - External Ramdisk

Are you saying that all of these items are measured by the early
loader (and *not* measured by the ACM or otherwise by anything that is
trustworthy and runs before the early code)?

>
> Outside of the case of an external ramdisk, the attacker can only
> pretend valid configuration data was passed to it.
>
> Correct me if I am wrong, but I don't think that is what is bothering
> you. You are either concerned with one of two cases here. Either you are
> concerned that the attacker may be able to hide the loading of a corrupt
> kernel or that the attacker can corrupt the kernel after loading.

Here is my concern:

Suppose there is a set of measurements that an attacker wants to
replicate.  Some of these measurements are done prior to transferring
control to the early code that's in this patchset (call these
before-Linux measurements) and some are done by the loaded kernel
(let's call these Linux measurements).

I am concerned that the attacker will load a combination of things
that have the correct before-Linux measurements but the wrong
after-Linux measurements.  (Wrong in the sense that, *if those
measurements actually landed in the PCRs, then the attacker would
lose*.)  *But* the attacker carefully chooses what they're loading to
gain control of the system prior to the actual PCR extension.  Then
the attacker extends the PCR with the hash that they want to
replicate, and the attacker wins.

For the security model to make any sense at all, then it needs to be
impossible for the attacker to gain control prior to the early kernel
code running without changing the before-Linux measurements.  But
there is a huge gap between when the early Linux code runs and when
the late initcalls run, and the attacker has that entire window to
break your security.

> first case, the answer is no; the attacker cannot. The kernel and the
> initrd if it was packed in the kernel are measured and sent to the TPM
> by the ACM running in cache-as-ram before execution begins.
>
> The second case is the flawed scenario, a strawman, if you will. This is
> a runtime-integrity problem that is outside the scope/protections of
> load-time-integrity solutions such as SRTM and DRTM. If the correct
> kernel was loaded and measured, but an attacker already has a position
> in the system that they can corrupt the kernel before the user-space
> init process can be run, then they already won.

I'm arguing that it seems like that this patchset has a
runtime-integrity problem.  It's outside the scope of the TXT spec per
se.  It's in the scope of *the Linux kernel*, and anyone who wants to
trust that the Linux DRTM code actually works needs to factor in this
giant weakness.

And you haven't explained why there is no way for an attacker to
corrupt the process between the early kernel code and the late
measurement code.


> > Is there some reason for all this complexity instead of extending the
> > PCRs at the early stage when the measurements are taken?
>
>
> We did have TPM logic in the setup kernel at one point. Within their
> rights, the TPM maintainers took the position that the only TPM
> interface logic should be the existing driver.

Hey TPM maintainers, I think this is nonsense, or maybe someone has
misinterpreted something that someone else said.  I understand that
avoiding code duplication is nice.  I understand that, at runtime, all
TPM access ought to go through the driver.  But, if the driver is
incapable of working during the very very early kernel load, then
there should be an alternate interface.  Kind of like how we have
early_printk instead of saying "well, there should only be one printk,
so instead of having early_printk, we'll just have a big buffer of
messages and log them eventually".  Or kind of like how we might call
EFI boot service functions during early boot.

--Andy