lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aBcLJxjTQGa1-r5S@gmail.com>
Date: Sun, 4 May 2025 08:37:27 +0200
From: Ingo Molnar <mingo@...nel.org>
To: linux-kernel@...r.kernel.org
Cc: linux-tip-commits@...r.kernel.org,
	Yazen Ghannam <yazen.ghannam@....com>,
	Mario Limonciello <mario.limonciello@....com>,
	"Borislav Petkov (AMD)" <bp@...en8.de>, x86@...nel.org
Subject: Re: [tip: x86/platform] x86/CPU/AMD: Print the reason for the last
 reset


* tip-bot2 for Yazen Ghannam <tip-bot2@...utronix.de> wrote:

> The register is accessed indirectly through a "PM" port in the FCH. Use
> MMIO access in order to avoid restrictions with legacy port access.
> 
> Use a late_initcall() to ensure that MMIO has been set up before trying to
> access the register.
> 
> This register was introduced with AMD Family 17h, so avoid access on older
> families. There is no CPUID feature bit for this register.
> 
>   [ bp: Simplify the reason dumping loop. ]
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> Co-developed-by: Mario Limonciello <mario.limonciello@....com>
> Signed-off-by: Mario Limonciello <mario.limonciello@....com>
> Signed-off-by: Borislav Petkov (AMD) <bp@...en8.de>
> Link: https://lore.kernel.org/20250422234830.2840784-6-superm1@kernel.org
> ---
>  Documentation/arch/x86/amd-debugging.rst | 42 ++++++++++++++++++-
>  arch/x86/include/asm/amd/fch.h           |  1 +-
>  arch/x86/kernel/cpu/amd.c                | 53 +++++++++++++++++++++++-
>  3 files changed, 96 insertions(+)

Looks mostly good, with a few minor comments:

> +Random reboot issues
> +====================
> +When a random reboot occurs, the high-level reason for the reboot is stored
> +in a register that will persist onto the next boot.

Please add an extra newline after the section title, as the rest of the 
document does.

> +There are 6 classes of reasons for the reboot:
> + * Software induced
> + * Power state transition
> + * Pin induced
> + * Hardware induced
> + * Remote reset
> + * Internal CPU event
>
> +.. csv-table::
> +   :header: "Bit", "Type", "Reason"
> +   :align: left
> +
> +   "0",  "Pin",      "thermal pin BP_THERMTRIP_L was tripped"
> +   "1",  "Pin",      "power button was pressed for 4 seconds"
> +   "2",  "Pin",      "shutdown pin was shorted"
> +   "4",  "Remote",   "remote ASF power off command was received"
> +   "9",  "Internal", "internal CPU thermal limit was tripped"
> +   "16", "Pin",      "system reset pin BP_SYS_RST_L was tripped"
> +   "17", "Software", "software issued PCI reset"
> +   "18", "Software", "software wrote 0x4 to reset control register 0xCF9"
> +   "19", "Software", "software wrote 0x6 to reset control register 0xCF9"
> +   "20", "Software", "software wrote 0xE to reset control register 0xCF9"
> +   "21", "Sleep",    "ACPI power state transition occurred"

This line is a bit of an odd one out: all other classes are the first 
word of their classes, while this one only says 'Sleep', that is 
specific to the event. "ACPI-state" might be a better class I suspect.

> +   "22", "Pin",      "keyboard reset pin KB_RST_L was asserted"
> +   "23", "Internal", "internal CPU shutdown event occurred"
> +   "24", "Hardware", "system failed to boot before failed boot timer expired"
> +   "25", "Hardware", "hardware watchdog timer expired"
> +   "26", "Remote",   "remote ASF reset command was received"
> +   "27", "Internal", "an uncorrected error caused a data fabric sync flood event"
> +   "29", "Internal", "FCH and MP1 failed warm reset handshake"
> +   "30", "Internal", "a parity error occurred"
> +   "31", "Internal", "a software sync flood event occurred"
> +
> +This information is read by the kernel at bootup and is saved into the
> +kernel ring buffer. When a random reboot occurs this message can be helpful
> +to determine the next component to debug such an issue.

The ring-buffer reference is a bit obtuse and confusing - printk is a 
log buffer. Maybe this refers to some earlier version of the patch?

Also:

 s/determine the next component to debug such an issue.
  /determine the next component to debug.


How about:

   This information is read by the kernel at bootup and printed into 
   the syslog. When a random reboot occurs this message can be helpful 
   to determine the next component to debug.

> +	[16] = "system reset pin BP_SYS_RST_L was tripped",

s/tripped
 /shorted

> +	[22] = "keyboard reset pin KB_RST_L was asserted",

s/asserted
 /shorted

'asserted' is fine too - but all 'pin' class messages should use 
consistent wording.

> +static __init int print_s5_reset_status_mmio(void)
> +{
> +	unsigned long value;
> +	void __iomem *addr;
> +	int i;
> +
> +	if (!cpu_feature_enabled(X86_FEATURE_ZEN))
> +		return 0;
> +
> +	addr = ioremap(FCH_PM_BASE + FCH_PM_S5_RESET_STATUS, sizeof(value));
> +	if (!addr)
> +		return 0;
> +
> +	value = ioread32(addr);
> +	iounmap(addr);
> +
> +	for (i = 0; i <= ARRAY_SIZE(s5_reset_reason_txt); i++) {
> +		if (!(value & BIT(i)))
> +			continue;
> +
> +		if (s5_reset_reason_txt[i])
> +			pr_info("x86/amd: Previous system reset reason [0x%08lx]: %s\n",
> +				value, s5_reset_reason_txt[i]);

Please use curly braces around multi-line statements.

With those addressed:

  Reviewed-by: Ingo Molnar <mingo@...nel.org>

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ