[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241021160202.GGZxZ6-gCNNKUtTRse@fat_crate.local>
Date: Mon, 21 Oct 2024 18:02:02 +0200
From: Borislav Petkov <bp@...en8.de>
To: bugzilla-daemon@...nel.org, Peter Huewe <peterhuewe@....de>,
Jarkko Sakkinen <jarkko@...nel.org>, Jason Gunthorpe <jgg@...pe.ca>,
linux-integrity@...r.kernel.org,
lkml <linux-kernel@...r.kernel.org>
Cc: mikeseohyungjin@...il.com
Subject: Re: [Bug 219383] New: System reboot on S3 sleep/wakeup test
Looks like TPM. CCing the proper people.
On Mon, Oct 14, 2024 at 12:46:26AM +0000, bugzilla-daemon@...nel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=219383
>
> Bug ID: 219383
> Summary: System reboot on S3 sleep/wakeup test
> Product: Platform Specific/Hardware
> Version: 2.5
> Hardware: All
> OS: Linux
> Status: NEW
> Severity: normal
> Priority: P3
> Component: x86-64
> Assignee: platform_x86_64@...nel-bugs.osdl.org
> Reporter: mikeseohyungjin@...il.com
> Regression: No
>
> I'm working for LG laptops, and I have run serveral LG PC with ubuntu OS. You
> may know, most LG laptops has intel soc.
> I found out a critical issue, system reboot on S3 sleep/wake up.
>
> Enviornments:
> - PC BIOS : Phoenix Technologies
> - Intel Jasperlake or Intel Lunarlake
> - OS Ubuntu 22.04(Jasperlake), 24.04.1(Lunarlake)
> - linux kernel version 6.x.0(Jasperlake) or up-to-date 6.11(Lunarlake)
>
> Symptom:
>
> Running the aging scripts like below, system reboots.
> -------------------------
> #!/bin/bash
> <snip>
> for (( i=1; i<=10000 ; i++ ))
> sudo rtcwake -m mem -s 10 >> ${LOG} 2>&1
> <snip>
> -------------------------
> The scripts works like below,
> 1. waits 10 secs
> 2. echo mem > /sys/power/state
> 3. waits 10 secs again and wake up system like press power button.
>
>
> My analysis:
>
> I had reproduced several times to find that BIOS side triggered the system
> reboots.
> | pm_suspend() | syscore_suspend() | acpi_suspend_enter() | ... | < BIOS > |
> ...| acpi_suspend_enter() | syscore_resume() | ...|
>
> Debugging on BIOS, TPM2 can generate cold reset when it detects something wrong
> after TPM resuming.
> In the BIOS code, if there are active PCR banks that are not supported by the
> Platform mask, it supposes to be update the TPM allocations and reboot the
> machine.
>
> It means that something in linux kernel side can effect operations of tpm when
> going to sleep.
> So, I have debuggered and traced the functions related to tpm, such as
> tpm_chip_start whenever the symptoms represented.
>
> In normal case, tpm_chip_start() called once like below,
> tpm_pm_suspend()-> tpm_chip_start().
> but issued case, additionally called below
> hwrng_fillfn ->
> rng_get_data ->
> tpm_hwrng_read ->
> tpm_get_random ->
> tpm_find_get_ops ->
> tpm_try_get_ops ->
> tpm_chip_start ->
>
> I found out that when running hwrng_fillfn(), related to Hardware random number
> generator, called during system_sleep, it can cause system reboots.
> To Verify it, I have tested with custom kernel which includes below patch.
>
> -----------------------
> From 373e92bb6d471c5fb42bacb97a4caf5375df5522 Mon Sep 17 00:00:00 2001
> From: mike Seo <mikeseohyungjin@...il.com>
> Date: Thu, 10 Oct 2024 14:04:57 +0900
> Subject: [PATCH] test_patch
>
> test_patch for reboot while sleep/wakeup
>
> Signed-off-by: mike Seo <mikeseohyungjin@...il.com>
> ---
> drivers/char/hw_random/core.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index 57c51efa5..d3f0059a4 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -25,6 +25,7 @@
> #include <linux/slab.h>
> #include <linux/string.h>
> #include <linux/uaccess.h>
> +#include <linux/suspend.h>
>
> #define RNG_MODULE_NAME "hw_random"
>
> @@ -469,6 +470,22 @@ static struct attribute *rng_dev_attrs[] = {
>
> ATTRIBUTE_GROUPS(rng_dev);
>
> +
> +static int hwrng_pm_notification(struct notifier_block *nb, unsigned long
> action, void *data)
> +{
> +
> + switch (action) {
> + case PM_SUSPEND_PREPARE:
> + is_suspend_prepare = 1;
> + break;
> + default:
> + is_suspend_prepare = 0;
> + break;
> + }
> + return 0;
> +}
> +
> +static struct notifier_block pm_notifier = { .notifier_call =
> hwrng_pm_notification };
> static int hwrng_fillfn(void *unused)
> {
> size_t entropy, entropy_credit = 0; /* in 1/1024 of a bit */
> @@ -478,6 +495,9 @@ static int hwrng_fillfn(void *unused)
> unsigned short quality;
> struct hwrng *rng;
>
> + while (is_suspend_prepare)
> + msleep(500);
> +
> rng = get_current_rng();
> if (IS_ERR(rng) || !rng)
> break;
> @@ -549,6 +569,7 @@ int hwrng_register(struct hwrng *rng)
> goto out_unlock;
> }
> mutex_unlock(&rng_mutex);
> + WARN_ON(register_pm_notifier(&pm_notifier));
> return 0;
> out_unlock:
> mutex_unlock(&rng_mutex);
> --
> 2.43.0
> ------------------------
>
> And I had passed over 10000 times of s3 wake/sleep aging test.
>
> Can you make some patches for this issue and merges?
>
> Thank you,
> Mike
>
> --
> You may reply to this email to add a comment.
>
> You are receiving this mail because:
> You are watching the assignee of the bug.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists