[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <D44C19QB8IK1.OMUJP7N91HRN@kernel.org>
Date: Thu, 12 Sep 2024 16:16:16 +0300
From: "Jarkko Sakkinen" <jarkko@...nel.org>
To: "James Bottomley" <James.Bottomley@...senPartnership.com>, "Roberto
Sassu" <roberto.sassu@...weicloud.com>, "Linux regressions mailing list"
<regressions@...ts.linux.dev>
Cc: <keyrings@...r.kernel.org>, "linux-integrity@...r.kernel.org"
<linux-integrity@...r.kernel.org>, "LKML" <linux-kernel@...r.kernel.org>,
"Pengyu Ma" <mapengyu@...il.com>
Subject: Re: [regression] significant delays when secureboot is enabled
since 6.10
On Wed Sep 11, 2024 at 3:21 PM EEST, James Bottomley wrote:
> On Wed, 2024-09-11 at 10:53 +0200, Roberto Sassu wrote:
> > On Tue, 2024-09-10 at 16:28 +0300, Jarkko Sakkinen wrote:
> > > On Tue Sep 10, 2024 at 3:57 PM EEST, James Bottomley wrote:
> > > > On Tue, 2024-09-10 at 15:48 +0300, Jarkko Sakkinen wrote:
> > > > > On Tue Sep 10, 2024 at 3:39 PM EEST, Jarkko Sakkinen wrote:
> > > > > > On Tue Sep 10, 2024 at 12:05 PM EEST, Roberto Sassu wrote:
> > > > > > > On Tue, 2024-09-10 at 11:01 +0200, Linux regression
> > > > > > > tracking
> > > > > > > (Thorsten
> > > > > > > Leemhuis) wrote:
> > > > > > > > Hi, Thorsten here, the Linux kernel's regression tracker.
> > > > > > > >
> > > > > > > > James, Jarkoo, I noticed a report about a regression in
> > > > > > > > bugzilla.kernel.org that appears to be caused by this
> > > > > > > > change of
> > > > > > > > yours:
> > > > > > > >
> > > > > > > > 6519fea6fd372b ("tpm: add hmac checks to
> > > > > > > > tpm2_pcr_extend()")
> > > > > > > > [v6.10-rc1]
> > > > > > > >
> > > > > > > > As many (most?) kernel developers don't keep an eye on
> > > > > > > > the bug
> > > > > > > > tracker,
> > > > > > > > I decided to forward it by mail. To quote from
> > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219229 :
> > > > > > > >
> > > > > > > > > When secureboot is enabled,
> > > > > > > > > the kernel boot time is ~20 seconds after 6.10 kernel.
> > > > > > > > > it's ~7 seconds on 6.8 kernel version.
> > > > > > > > >
> > > > > > > > > When secureboot is disabled,
> > > > > > > > > the boot time is ~7 seconds too.
> > > > > > > > >
> > > > > > > > > Reproduced on both AMD and Intel platform on ThinkPad
> > > > > > > > > X1 and
> > > > > > > > > T14.
> > > > > > > > >
> > > > > > > > > It probably caused autologin failure and micmute led
> > > > > > > > > not
> > > > > > > > > loaded on AMD platform.
> > > > > > > >
> > > > > > > > It was later bisected to the change mentioned above. See
> > > > > > > > the
> > > > > > > > ticket for
> > > > > > > > more details.
> > > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > I suspect I encountered the same problem:
> > > > > > >
> > > > > > > https://lore.kernel.org/linux-integrity/b8a7b3566e6014ba102ab98e10ede0d574d8930e.camel@huaweicloud.com/
> > > > > > >
> > > > > > > Going to provide more info there.
> > > > > >
> > > > > > I suppose you are going try to acquire the tracing data I
> > > > > > asked?
> > > > > > That would be awesome, thanks for taking the troube. Let's
> > > > > > look
> > > > > > at the data and draw conclusions based on that.
> > > > > >
> > > > > > Workaround is pretty simple: CONFIG_TCG_TPM2_HMAC=n to the
> > > > > > kernel
> > > > > > configuration disables the feature.
> > > > > >
> > > > > > For making decisions what to do with the we are talking
> > > > > > about ~2
> > > > > > week window estimated, given the Vienna conference slows
> > > > > > things
> > > > > > down, so I hope my workaround is good enough before that.
> > > > >
> > > > > I can enumerate three most likely ways to address the issue:
> > > > >
> > > > > 1. Strongest: drop from defconfig.
> > > > > 2. Medium: leave to defconfig but add an opt-in kernel command-
> > > > > line
> > > > > parameter.
> > > > > 3. Lightest: if we can based on tracing data nail the
> > > > > regression in
> > > > > sustainable schedule, fix it.
> > > >
> > > > Actually, there's a fourth: not use sessions for the PCR extend
> > > > (if
> > > > we'd got the timings when I asked, this was going to be my
> > > > suggestion
> > > > if they came back problematic). This seems only to be a problem
> > > > for
> > > > IMA measured boot (because it does lots of extends). If
> > > > necessary this
> > > > could even be wrapped in a separate config or boot option that
> > > > only
> > > > disables HMAC on extend if IMA (so we still get security for
> > > > things
> > > > like sd-boot)
> > >
> > > I can buy that but with a twist that make it an opt-in kernel
> > > command
> > > line option. We don't want to take already existing functionality
> > > away
> > > from those who might want to use it (given e.g. hardening
> > > requirements),
> > > and with that basis opt-in (by default disabled) would be more
> > > balanced
> > > way to address the issue.
> > >
> > > Please do a send a patch!
> >
> > I made few measurements. I have a Fedora 38 VM with TPM passthrough.
> >
> > Kernels: 6.11-rc2+ (guest), 6.5.0-45-generic (host)
> >
> > QEMU:
> >
> > rc qemu-kvm 1:4.2-
> > 3ubuntu6.27
> > ii qemu-system-x86 1:6.2+dfsg-
> > 2ubuntu6.22
> >
> >
> > TPM2_PT_MANUFACTURER:
> > raw: 0x49465800
> > value: "IFX"
> > TPM2_PT_VENDOR_STRING_1:
> > raw: 0x534C4239
> > value: "SLB9"
> > TPM2_PT_VENDOR_STRING_2:
> > raw: 0x36373000
> > value: "670"
> >
> >
> > No HMAC:
> >
> > # tracer: function_graph
> > #
> > # CPU DURATION FUNCTION CALLS
> > # | | | | | | |
> > 0) | tpm2_pcr_extend() {
> > 0) 1.112 us | tpm_buf_append_hmac_session();
> > 0) # 6360.029 us | tpm_transmit_cmd();
> > 0) # 6415.012 us | }
> >
> >
> > HMAC:
> >
> > # tracer: function_graph
> > #
> > # CPU DURATION FUNCTION CALLS
> > # | | | | | | |
> > 1) | tpm2_pcr_extend() {
> > 1) | tpm2_start_auth_session() {
> > 1) * 36976.99 us | tpm_transmit_cmd();
> > 1) * 84746.51 us | tpm_transmit_cmd();
> > 1) # 3195.083 us | tpm_transmit_cmd();
> > 1) @ 126795.1 us | }
> > 1) 2.254 us | tpm_buf_append_hmac_session();
> > 1) 3.546 us | tpm_buf_fill_hmac_session();
> > 1) * 24356.46 us | tpm_transmit_cmd();
> > 1) 3.496 us | tpm_buf_check_hmac_response();
> > 1) @ 151171.0 us | }
>
> Well, unfortunately, that tells us that it's the TPM itself that's
> taking the time processing the security overhead. The ordering of the
> commands in tpm2_start_auth_session() shows
>
> 37ms for context restore of null key
> 85ms for start session with encrypted salt
> 3ms to flush null key
> -----
> 125ms
>
> If we context save the session, we'd likely only bear a single 37ms
> cost to restore it (replacing the total 125ms). However, there's
> nothing we can do about the extend execution going from 6ms to 24ms, so
> I could halve your current boot time with security enabled (it's
> currently 149ms, it would go to 61ms, but it's still 10x slower than
> the unsecured extend at 6ms)
>
> James
I'll hold for better benchmarks.
BR, Jarkko
Powered by blists - more mailing lists