linux-kernel - Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some Intel systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <92b93b79-14b9-46fe-9d4f-f44ab75fd229@amd.com>
Date:   Fri, 18 Aug 2023 12:21:04 -0500
From:   Mario Limonciello <mario.limonciello@....com>
To:     Jarkko Sakkinen <jarkko@...nel.org>, todd.e.brandt@...ux.intel.com,
        linux-integrity@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org, len.brown@...el.com,
        charles.d.prestopine@...el.com, rafael.j.wysocki@...el.com
Subject: Re: REGRESSION WITH BISECT: v6.5-rc6 TPM patch breaks S3 on some
 Intel systems

On 8/18/2023 12:00, Jarkko Sakkinen wrote:
> On Fri Aug 18, 2023 at 4:58 AM EEST, Limonciello, Mario wrote:
>>
>>
>> On 8/17/2023 5:33 PM, Jarkko Sakkinen wrote:
>>> On Fri Aug 18, 2023 at 1:25 AM EEST, Todd Brandt wrote:
>>>> On Fri, 2023-08-18 at 00:47 +0300, Jarkko Sakkinen wrote:
>>>>> On Fri Aug 18, 2023 at 12:09 AM EEST, Todd Brandt wrote:
>>>>>> While testing S3 on 6.5.0-rc6 we've found that 5 systems are seeing
>>>>>> a
>>>>>> crash and reboot situation when S3 suspend is initiated. To
>>>>>> reproduce
>>>>>> it, this call is all that's required "sudo sleepgraph -m mem
>>>>>> -rtcwake
>>>>>> 15".
>>>>>
>>>>> 1. Are there logs available?
>>>>> 2. Is this the test case: https://pypi.org/project/sleepgraph/ (never
>>>>> used it before).
>>>>
>>>> There are no dmesg logs because the S3 crash wipes them out. Sleepgraph
>>>> isn't actually necessary to activate it, just an S3 suspend "echo mem >
>>>> /sys/power/state".
>>>>
>>>> So far it appears to only have affected test systems, not production
>>>> hardware, and none of them have TPM chips, so I'm beginning to wonder
>>>> if this patch just inadvertently activated a bug somewhere else in the
>>>> kernel that happens to affect test hardware.
>>>>
>>>> I'll continue to debug it, this isn't an emergency as so far I haven't
>>>> seen it in production hardware.
>>>
>>> OK, I'll still see if I could reproduce it just in case.
>>>
>>> BR, Jarkko
>>
>> I'd like to better understand what kind of TPM initialization path has
>> run.  Does the machine have some sort of TPM that failed to fully
>> initialize perhaps?
>>
>> If you can't share a full bootup dmesg, can you at least share
>>
>> # dmesg | grep -i tpm
> 
> It would be more useful perhaps to get full dmesg output after power on
> and before going into suspend.
> 
> Also ftrace filter could be added to the kernel command-line:
> 
> ftrace=function ftrace_filter=tpm*
> 
> After bootup:
> 
> mount -t tracefs nodev /sys/kernel/tracing
> cat /sys/kernel/tracing/trace
> 
> BR, Jarkko

Todd and I have gone back and forth a little bit on the bugzilla 
(https://bugzilla.kernel.org/show_bug.cgi?id=217804), and it seems that 
this isn't an S3 problem - it's a probing problem.

[    1.132521] tpm_crb: probe of INTC6001:00 failed with error 378

That error 378 specifically matches TPM2_CC_GET_CAPABILITY, which is the 
same command that was being requested.  This leads me to believe the TPM 
isn't ready at the time of probing.

In this case one solution is we could potentially ignore failures for 
that tpm2_get_tpm_pt() call, but I think we should first understand why 
it doesn't work at probing time for this TPM to ensure the actual quirk
isn't built on a house of cards.