[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <eec91766-10a9-4d50-8e82-376f52f54be8@amd.com>
Date: Fri, 18 Aug 2023 13:11:57 -0500
From: Mario Limonciello <mario.limonciello@....com>
To: Jarkko Sakkinen <jarkko@...nel.org>, todd.e.brandt@...ux.intel.com,
linux-integrity@...r.kernel.org
Cc: linux-kernel@...r.kernel.org, len.brown@...el.com,
charles.d.prestopine@...el.com, rafael.j.wysocki@...el.com
Subject: Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some
Intel systems
On 8/18/2023 13:07, Jarkko Sakkinen wrote:
> On Fri Aug 18, 2023 at 8:57 PM EEST, Mario Limonciello wrote:
>> On 8/18/2023 12:53, Jarkko Sakkinen wrote:
>>> On Fri Aug 18, 2023 at 8:21 PM EEST, Mario Limonciello wrote:
>>>> On 8/18/2023 12:00, Jarkko Sakkinen wrote:
>>>>> On Fri Aug 18, 2023 at 4:58 AM EEST, Limonciello, Mario wrote:
>>>>>>
>>>>>>
>>>>>> On 8/17/2023 5:33 PM, Jarkko Sakkinen wrote:
>>>>>>> On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote:
>>>>>>>> On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote:
>>>>>>>>> On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote:
>>>>>>>>>> While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing
>>>>>>>>>> a
>>>>>>>>>> crash and reboot situation when S3 suspend is initiated. To
>>>>>>>>>> reproduce
>>>>>>>>>> it, this call is all that's required "sudo sleepgraph -m mem
>>>>>>>>>> -rtcwake
>>>>>>>>>> 15".
>>>>>>>>>
>>>>>>>>> 1. Are there logs available?
>>>>>>>>> 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never
>>>>>>>>> used it before).
>>>>>>>>
>>>>>>>> There are no dmesg logs because the S3 crash wipes them out. Sleepgraph
>>>>>>>> isn't actually necessary to activate it, just an S3 suspend "echo mem >
>>>>>>>> /sys/power/state".
>>>>>>>>
>>>>>>>> So far it appears to only have affected test systems, not production
>>>>>>>> hardware, and none of them have TPM chips, so I'm beginning to wonder
>>>>>>>> if this patch just inadvertently activated a bug somewhere else in the
>>>>>>>> kernel that happens to affect test hardware.
>>>>>>>>
>>>>>>>> I'll continue to debug it, this isn't an emergency as so far I haven't
>>>>>>>> seen it in production hardware.
>>>>>>>
>>>>>>> OK, I'll still see if I could reproduce it just in case.
>>>>>>>
>>>>>>> BR, Jarkko
>>>>>>
>>>>>> I'd like to better understand what kind of TPM initialization path has
>>>>>> run. Does the machine have some sort of TPM that failed to fully
>>>>>> initialize perhaps?
>>>>>>
>>>>>> If you can't share a full bootup dmesg, can you at least share
>>>>>>
>>>>>> # dmesg | grep -i tpm
>>>>>
>>>>> It would be more useful perhaps to get full dmesg output after power on
>>>>> and before going into suspend.
>>>>>
>>>>> Also ftrace filter could be added to the kernel command-line:
>>>>>
>>>>> ftrace=function ftrace_filter=tpm*
>>>>>
>>>>> After bootup:
>>>>>
>>>>> mount -t tracefs nodev /sys/kernel/tracing
>>>>> cat /sys/kernel/tracing/trace
>>>>>
>>>>> BR, Jarkko
>>>>
>>>> Todd and I have gone back and forth a little bit on the bugzilla
>>>> (https://bugzilla.kernel.org/show_bug.cgi?id=217804), and it seems that
>>>> this isn't an S3 problem - it's a probing problem.
>>>>
>>>> [ 1.132521] tpm_crb: probe of INTC6001:00 failed with error 378
>>>>
>>>> That error 378 specifically matches TPM2_CC_GET_CAPABILITY, which is the
>>>> same command that was being requested. This leads me to believe the TPM
>>>> isn't ready at the time of probing.
>>>>
>>>> In this case one solution is we could potentially ignore failures for
>>>> that tpm2_get_tpm_pt() call, but I think we should first understand why
>>>> it doesn't work at probing time for this TPM to ensure the actual quirk
>>>> isn't built on a house of cards.
>>>
>>> Given that there is nothing known broken (at the moment) in production,
>>> I think the following might be a reasonable change.
>>>
>>> BR, Jarkko
>>>
>>
>> Yeah that would prevent it.
>>
>> Here's a simpler change that I think should work too though:
>> diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
>> index 9eb1a18590123..b0e9931fe436c 100644
>> --- a/drivers/char/tpm/tpm_crb.c
>> +++ b/drivers/char/tpm/tpm_crb.c
>> @@ -472,8 +472,7 @@ static int crb_check_flags(struct tpm_chip *chip)
>> if (ret)
>> return ret;
>>
>> - ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL);
>> - if (ret)
>> + if (tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL))
>> goto release;
>>
>> if (val == 0x414D4400U /* AMD */)
>>
>> I think Todd needs to check whether TPM works with that or not though.
>
> Hmm... I'm sorry if I have a blind spot now but what is that changing?
>
> BR, Jarkko
It throws away the error code if it fails for some reason.
Todd just checked it works too. I'll drop it on the M/L for review.
Powered by blists - more mailing lists