[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aBcLJxjTQGa1-r5S@gmail.com>
Date: Sun, 4 May 2025 08:37:27 +0200
From: Ingo Molnar <mingo@...nel.org>
To: linux-kernel@...r.kernel.org
Cc: linux-tip-commits@...r.kernel.org,
Yazen Ghannam <yazen.ghannam@....com>,
Mario Limonciello <mario.limonciello@....com>,
"Borislav Petkov (AMD)" <bp@...en8.de>, x86@...nel.org
Subject: Re: [tip: x86/platform] x86/CPU/AMD: Print the reason for the last
reset
* tip-bot2 for Yazen Ghannam <tip-bot2@...utronix.de> wrote:
> The register is accessed indirectly through a "PM" port in the FCH. Use
> MMIO access in order to avoid restrictions with legacy port access.
>
> Use a late_initcall() to ensure that MMIO has been set up before trying to
> access the register.
>
> This register was introduced with AMD Family 17h, so avoid access on older
> families. There is no CPUID feature bit for this register.
>
> [ bp: Simplify the reason dumping loop. ]
>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> Co-developed-by: Mario Limonciello <mario.limonciello@....com>
> Signed-off-by: Mario Limonciello <mario.limonciello@....com>
> Signed-off-by: Borislav Petkov (AMD) <bp@...en8.de>
> Link: https://lore.kernel.org/20250422234830.2840784-6-superm1@kernel.org
> ---
> Documentation/arch/x86/amd-debugging.rst | 42 ++++++++++++++++++-
> arch/x86/include/asm/amd/fch.h | 1 +-
> arch/x86/kernel/cpu/amd.c | 53 +++++++++++++++++++++++-
> 3 files changed, 96 insertions(+)
Looks mostly good, with a few minor comments:
> +Random reboot issues
> +====================
> +When a random reboot occurs, the high-level reason for the reboot is stored
> +in a register that will persist onto the next boot.
Please add an extra newline after the section title, as the rest of the
document does.
> +There are 6 classes of reasons for the reboot:
> + * Software induced
> + * Power state transition
> + * Pin induced
> + * Hardware induced
> + * Remote reset
> + * Internal CPU event
>
> +.. csv-table::
> + :header: "Bit", "Type", "Reason"
> + :align: left
> +
> + "0", "Pin", "thermal pin BP_THERMTRIP_L was tripped"
> + "1", "Pin", "power button was pressed for 4 seconds"
> + "2", "Pin", "shutdown pin was shorted"
> + "4", "Remote", "remote ASF power off command was received"
> + "9", "Internal", "internal CPU thermal limit was tripped"
> + "16", "Pin", "system reset pin BP_SYS_RST_L was tripped"
> + "17", "Software", "software issued PCI reset"
> + "18", "Software", "software wrote 0x4 to reset control register 0xCF9"
> + "19", "Software", "software wrote 0x6 to reset control register 0xCF9"
> + "20", "Software", "software wrote 0xE to reset control register 0xCF9"
> + "21", "Sleep", "ACPI power state transition occurred"
This line is a bit of an odd one out: all other classes are the first
word of their classes, while this one only says 'Sleep', that is
specific to the event. "ACPI-state" might be a better class I suspect.
> + "22", "Pin", "keyboard reset pin KB_RST_L was asserted"
> + "23", "Internal", "internal CPU shutdown event occurred"
> + "24", "Hardware", "system failed to boot before failed boot timer expired"
> + "25", "Hardware", "hardware watchdog timer expired"
> + "26", "Remote", "remote ASF reset command was received"
> + "27", "Internal", "an uncorrected error caused a data fabric sync flood event"
> + "29", "Internal", "FCH and MP1 failed warm reset handshake"
> + "30", "Internal", "a parity error occurred"
> + "31", "Internal", "a software sync flood event occurred"
> +
> +This information is read by the kernel at bootup and is saved into the
> +kernel ring buffer. When a random reboot occurs this message can be helpful
> +to determine the next component to debug such an issue.
The ring-buffer reference is a bit obtuse and confusing - printk is a
log buffer. Maybe this refers to some earlier version of the patch?
Also:
s/determine the next component to debug such an issue.
/determine the next component to debug.
How about:
This information is read by the kernel at bootup and printed into
the syslog. When a random reboot occurs this message can be helpful
to determine the next component to debug.
> + [16] = "system reset pin BP_SYS_RST_L was tripped",
s/tripped
/shorted
> + [22] = "keyboard reset pin KB_RST_L was asserted",
s/asserted
/shorted
'asserted' is fine too - but all 'pin' class messages should use
consistent wording.
> +static __init int print_s5_reset_status_mmio(void)
> +{
> + unsigned long value;
> + void __iomem *addr;
> + int i;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_ZEN))
> + return 0;
> +
> + addr = ioremap(FCH_PM_BASE + FCH_PM_S5_RESET_STATUS, sizeof(value));
> + if (!addr)
> + return 0;
> +
> + value = ioread32(addr);
> + iounmap(addr);
> +
> + for (i = 0; i <= ARRAY_SIZE(s5_reset_reason_txt); i++) {
> + if (!(value & BIT(i)))
> + continue;
> +
> + if (s5_reset_reason_txt[i])
> + pr_info("x86/amd: Previous system reset reason [0x%08lx]: %s\n",
> + value, s5_reset_reason_txt[i]);
Please use curly braces around multi-line statements.
With those addressed:
Reviewed-by: Ingo Molnar <mingo@...nel.org>
Thanks,
Ingo
Powered by blists - more mailing lists