linux-kernel - Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <64f62f2f-91ef-4707-b1bb-19ce5e81f719@amd.com>
Date:   Fri, 18 Aug 2023 12:57:20 -0500
From:   Mario Limonciello <mario.limonciello@....com>
To:     Jarkko Sakkinen <jarkko@...nel.org>, todd.e.brandt@...ux.intel.com,
        linux-integrity@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org, len.brown@...el.com,
        charles.d.prestopine@...el.com, rafael.j.wysocki@...el.com
Subject: Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some
 Intel systems

On 8/18/2023 12:53, Jarkko Sakkinen wrote:
> On Fri Aug 18, 2023 at 8:21 PM EEST, Mario Limonciello wrote:
>> On 8/18/2023 12:00, Jarkko Sakkinen wrote:
>>> On Fri Aug 18, 2023 at 4:58 AM EEST, Limonciello, Mario wrote:
>>>>
>>>>
>>>> On 8/17/2023 5:33 PM, Jarkko Sakkinen wrote:
>>>>> On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote:
>>>>>> On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote:
>>>>>>> On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote:
>>>>>>>> While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing
>>>>>>>> a
>>>>>>>> crash and reboot situation when S3 suspend is initiated. To
>>>>>>>> reproduce
>>>>>>>> it, this call is all that's required "sudo sleepgraph -m mem
>>>>>>>> -rtcwake
>>>>>>>> 15".
>>>>>>>
>>>>>>> 1. Are there logs available?
>>>>>>> 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never
>>>>>>> used it before).
>>>>>>
>>>>>> There are no dmesg logs because the S3 crash wipes them out. Sleepgraph
>>>>>> isn't actually necessary to activate it, just an S3 suspend "echo mem >
>>>>>> /sys/power/state".
>>>>>>
>>>>>> So far it appears to only have affected test systems, not production
>>>>>> hardware, and none of them have TPM chips, so I'm beginning to wonder
>>>>>> if this patch just inadvertently activated a bug somewhere else in the
>>>>>> kernel that happens to affect test hardware.
>>>>>>
>>>>>> I'll continue to debug it, this isn't an emergency as so far I haven't
>>>>>> seen it in production hardware.
>>>>>
>>>>> OK, I'll still see if I could reproduce it just in case.
>>>>>
>>>>> BR, Jarkko
>>>>
>>>> I'd like to better understand what kind of TPM initialization path has
>>>> run.  Does the machine have some sort of TPM that failed to fully
>>>> initialize perhaps?
>>>>
>>>> If you can't share a full bootup dmesg, can you at least share
>>>>
>>>> # dmesg | grep -i tpm
>>>
>>> It would be more useful perhaps to get full dmesg output after power on
>>> and before going into suspend.
>>>
>>> Also ftrace filter could be added to the kernel command-line:
>>>
>>> ftrace=function ftrace_filter=tpm*
>>>
>>> After bootup:
>>>
>>> mount -t tracefs nodev /sys/kernel/tracing
>>> cat /sys/kernel/tracing/trace
>>>
>>> BR, Jarkko
>>
>> Todd and I have gone back and forth a little bit on the bugzilla
>> (https://bugzilla.kernel.org/show_bug.cgi?id=217804), and it seems that
>> this isn't an S3 problem - it's a probing problem.
>>
>> [    1.132521] tpm_crb: probe of INTC6001:00 failed with error 378
>>
>> That error 378 specifically matches TPM2_CC_GET_CAPABILITY, which is the
>> same command that was being requested.  This leads me to believe the TPM
>> isn't ready at the time of probing.
>>
>> In this case one solution is we could potentially ignore failures for
>> that tpm2_get_tpm_pt() call, but I think we should first understand why
>> it doesn't work at probing time for this TPM to ensure the actual quirk
>> isn't built on a house of cards.
> 
> Given that there is nothing known broken (at the moment) in production,
> I think the following might be a reasonable change.
> 
> BR, Jarkko
> 

Yeah that would prevent it.

Here's a simpler change that I think should work too though:

diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 9eb1a18590123..b0e9931fe436c 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -472,8 +472,7 @@ static int crb_check_flags(struct tpm_chip *chip)
         if (ret)
                 return ret;

-       ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL);
-       if (ret)
+       if (tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL))
                 goto release;

         if (val == 0x414D4400U /* AMD */)

I think Todd needs to check whether TPM works with that or not though.