[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10ae7b8592af7bacef87e493e6d628a027641b8d.camel@HansenPartnership.com>
Date: Wed, 11 Sep 2024 08:21:52 -0400
From: James Bottomley <James.Bottomley@...senPartnership.com>
To: Roberto Sassu <roberto.sassu@...weicloud.com>, Jarkko Sakkinen
<jarkko@...nel.org>, Linux regressions mailing list
<regressions@...ts.linux.dev>
Cc: keyrings@...r.kernel.org, "linux-integrity@...r.kernel.org"
<linux-integrity@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
Pengyu Ma <mapengyu@...il.com>
Subject: Re: [regression] significant delays when secureboot is enabled
since 6.10
On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote:
> On Tue, 2024-09-10 at 16:28 +0300, Jarkko Sakkinen wrote:
> > On Tue Sep 10, 2024 at 3:57 PM EEST, James Bottomley wrote:
> > > On Tue, 2024-09-10 at 15:48 +0300, Jarkko Sakkinen wrote:
> > > > On Tue Sep 10, 2024 at 3:39 PM EEST, Jarkko Sakkinen wrote:
> > > > > On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote:
> > > > > > On Tue, 2024-09-10 at 11:01 +0200, Linux regression
> > > > > > tracking
> > > > > > (Thorsten
> > > > > > Leemhuis) wrote:
> > > > > > > Hi, Thorsten here, the Linux kernel's regression tracker.
> > > > > > >
> > > > > > > James, Jarkoo, I noticed a report about a regression in
> > > > > > > bugzilla.kernel.org that appears to be caused by this
> > > > > > > change of
> > > > > > > yours:
> > > > > > >
> > > > > > > 6519fea6fd372b ("tpm: add hmac checks to
> > > > > > > tpm2_pcr_extend()")
> > > > > > > [v6.10-rc1]
> > > > > > >
> > > > > > > As many (most?) kernel developers don't keep an eye on
> > > > > > > the bug
> > > > > > > tracker,
> > > > > > > I decided to forward it by mail. To quote from
> > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 :
> > > > > > >
> > > > > > > > When secureboot is enabled,
> > > > > > > > the kernel boot time is ~20 seconds after 6.10 kernel.
> > > > > > > > it's ~7 seconds on 6.8 kernel version.
> > > > > > > >
> > > > > > > > When secureboot is disabled,
> > > > > > > > the boot time is ~7 seconds too.
> > > > > > > >
> > > > > > > > Reproduced on both AMD and Intel platform on ThinkPad
> > > > > > > > X1 and
> > > > > > > > T14.
> > > > > > > >
> > > > > > > > It probably caused autologin failure and micmute led
> > > > > > > > not
> > > > > > > > loaded on AMD platform.
> > > > > > >
> > > > > > > It was later bisected to the change mentioned above. See
> > > > > > > the
> > > > > > > ticket for
> > > > > > > more details.
> > > > > >
> > > > > > Hi
> > > > > >
> > > > > > I suspect I encountered the same problem:
> > > > > >
> > > > > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@huaweicloud.com/
> > > > > >
> > > > > > Going to provide more info there.
> > > > >
> > > > > I suppose you are going try to acquire the tracing data I
> > > > > asked?
> > > > > That would be awesome, thanks for taking the troube. Let's
> > > > > look
> > > > > at the data and draw conclusions based on that.
> > > > >
> > > > > Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the
> > > > > kernel
> > > > > configuration disables the feature.
> > > > >
> > > > > For making decisions what to do with the we are talking
> > > > > about ~2
> > > > > week window estimated, given the Vienna conference slows
> > > > > things
> > > > > down, so I hope my workaround is good enough before that.
> > > >
> > > > I can enumerate three most likely ways to address the issue:
> > > >
> > > > 1. Strongest: drop from defconfig.
> > > > 2. Medium: leave to defconfig but add an opt-in kernel command-
> > > > line
> > > > parameter.
> > > > 3. Lightest: if we can based on tracing data nail the
> > > > regression in
> > > > sustainable schedule, fix it.
> > >
> > > Actually, there's a fourth: not use sessions for the PCR extend
> > > (if
> > > we'd got the timings when I asked, this was going to be my
> > > suggestion
> > > if they came back problematic). This seems only to be a problem
> > > for
> > > IMA measured boot (because it does lots of extends). If
> > > necessary this
> > > could even be wrapped in a separate config or boot option that
> > > only
> > > disables HMAC on extend if IMA (so we still get security for
> > > things
> > > like sd-boot)
> >
> > I can buy that but with a twist that make it an opt-in kernel
> > command
> > line option. We don't want to take already existing functionality
> > away
> > from those who might want to use it (given e.g. hardening
> > requirements),
> > and with that basis opt-in (by default disabled) would be more
> > balanced
> > way to address the issue.
> >
> > Please do a send a patch!
>
> I made few measurements. I have a Fedora 38 VM with TPM passthrough.
>
> Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host)
>
> QEMU:
>
> rc qemu-kvm 1:4.2-
> 3ubuntu6.27
> ii qemu-system-x86 1:6.2+dfsg-
> 2ubuntu6.22
>
>
> TPM2_PT_MANUFACTURER:
> raw: 0x49465800
> value: "IFX"
> TPM2_PT_VENDOR_STRING_1:
> raw: 0x534C4239
> value: "SLB9"
> TPM2_PT_VENDOR_STRING_2:
> raw: 0x36373000
> value: "670"
>
>
> No HMAC:
>
> # tracer: function_graph
> #
> # CPU DURATION FUNCTION CALLS
> # | | | | | | |
> 0) | tpm2_pcr_extend() {
> 0) 1.112 us | tpm_buf_append_hmac_session();
> 0) # 6360.029 us | tpm_transmit_cmd();
> 0) # 6415.012 us | }
>
>
> HMAC:
>
> # tracer: function_graph
> #
> # CPU DURATION FUNCTION CALLS
> # | | | | | | |
> 1) | tpm2_pcr_extend() {
> 1) | tpm2_start_auth_session() {
> 1) * 36976.99 us | tpm_transmit_cmd();
> 1) * 84746.51 us | tpm_transmit_cmd();
> 1) # 3195.083 us | tpm_transmit_cmd();
> 1) @ 126795.1 us | }
> 1) 2.254 us | tpm_buf_append_hmac_session();
> 1) 3.546 us | tpm_buf_fill_hmac_session();
> 1) * 24356.46 us | tpm_transmit_cmd();
> 1) 3.496 us | tpm_buf_check_hmac_response();
> 1) @ 151171.0 us | }
Well, unfortunately, that tells us that it's the TPM itself that's
taking the time processing the security overhead. The ordering of the
commands in tpm2_start_auth_session() shows
37ms for context restore of null key
85ms for start session with encrypted salt
3ms to flush null key
-----
125ms
If we context save the session, we'd likely only bear a single 37ms
cost to restore it (replacing the total 125ms). However, there's
nothing we can do about the extend execution going from 6ms to 24ms, so
I could halve your current boot time with security enabled (it's
currently 149ms, it would go to 61ms, but it's still 10x slower than
the unsecured extend at 6ms)
James
Powered by blists - more mailing lists