[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1f76fc7d-3067-4bd8-912d-62ebf68ba696@gmx.de>
Date: Wed, 28 Jan 2026 03:45:47 +0100
From: Armin Wolf <W_Armin@....de>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] ACPI: OSL: Panic when encountering a fatal ACPI error
Am 27.01.26 um 22:05 schrieb Rafael J. Wysocki:
> On Tue, Jan 27, 2026 at 7:32 AM Armin Wolf <W_Armin@....de> wrote:
>> The ACPI spec states that the operating system should respond
>> to a fatal ACPI error by "performing a controlled OS shutdown in
>> a timely fashion". Windows complies with this requirement by entering
>> a BSOD upon executing the "Fatal()" ASL opcode, a implementation
>> detail that is used by some firmware implementations for signaling
>> fatal hardware errors.
>>
>> Comply with the ACPI specification by triggering a kernel panic
>> when ACPICA signals a fatal ACPI error.
> I'm not sure if a kernel panic really counts as "a controlled OS shutdown".
>
> Shouldn't this be treated in a similar way to crossing a critical
> thermal trip point?
>
> Also, is there a reason beyond "follow Windows" to do this?
Some firmware implementation use Fatal () to signal fatal hardware errors.
The ThinkPad X1 Yoga Gen 8 for example has something like this:
Method (_Q7F, 0, NotSerialized) // _Qxx: EC Query, xx=0x00-0xFF
{
Fatal (0x01, 0x80010000, 0x00036C4E)
}
I strongly suspect that the EC issues query 0x7F when encountering a fatal internal error.
Trying to perform an orderly shutdown in this case might cause the whole system to hang
or to respond erratically.
The other use cases for Fatal () that i know of are:
- failing if some PCI components refuse to resume (MacBookPro 16)
- failing if a WMI-ACPI device is being accessed while still being disabled (Windows WMI-ACPI example code)
- failing if the hardware resource configuration is invalid (ThinkPad T410, among others)
In all those cases the firmware developers assume that Fatal () causes the operating system to panic.
This seems to be true even on MacOS: https://discussions.apple.com/thread/250384018
We thus should threat the Fatal () opcode like an assert() statement that is used to signal
when the firmware has encountered a problem it cannot handle. The only way to safely "recover"
from this is a kernel panic, as a normal shutdown relies on the ACPI firmware still being
operational.
>> Users can still disable
>> this behavior by using the acpi.panic_on_fatal kernel option to
>> work around firmware bugs.
> This is not enough. You are talking about throwing away user data if
> the firmware asks the kernel to do so.
The kernel also throws away user data if the hardware itself signals a fatal error.
Why should we react differently when the ACPI BIOS does the same, especially when
MacOS and Windows do exactly that?
I agree that being able to differentiate between fatal errors and recoverable errors
would be much better, but sadly that is not how most of the ACPI implementations out
there where designed to work.
Thanks,
Armin Wolf
>> Link: https://uefi.org/specs/ACPI/6.6/19_ASL_Reference.html#fatal-fatal-error-check
>> Signed-off-by: Armin Wolf <W_Armin@....de>
>> ---
>> Changes since v1:
>> - use IS_ENABLED() for checking the presence of CONFIG_ACPI_PANIC_ON_FATAL
>> ---
>> .../admin-guide/kernel-parameters.txt | 9 +++++++++
>> drivers/acpi/Kconfig | 11 +++++++++++
>> drivers/acpi/osl.c | 19 ++++++++++++++++++-
>> 3 files changed, 38 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 1058f2a6d6a8..140bb239857f 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -187,6 +187,15 @@ Kernel parameters
>> unusable. The "log_buf_len" parameter may be useful
>> if you need to capture more output.
>>
>> + acpi.panic_on_fatal= [ACPI]
>> + {0 | 1}
>> + Causes the kernel to panic when the ACPI bytecode signals
>> + a fatal error. The default value of this setting can
>> + be configured using CONFIG_ACPI_PANIC_ON_FATAL.
>> + Overriding this value should only be done for diagnosing
>> + ACPI firmware problems, as some firmware implementations
>> + use this mechanism to signal fatal hardware errors.
>> +
>> acpi_enforce_resources= [ACPI]
>> { strict | lax | no }
>> Check for resource conflicts between native drivers
>> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
>> index df0ff0764d0d..7139e7d8ac25 100644
>> --- a/drivers/acpi/Kconfig
>> +++ b/drivers/acpi/Kconfig
>> @@ -65,6 +65,17 @@ config ACPI_THERMAL_LIB
>> depends on THERMAL
>> bool
>>
>> +config ACPI_PANIC_ON_FATAL
>> + bool "Panic on fatal ACPI error"
>> + default y
>> + help
>> + The ACPI bytecode can signal that a fatal error has occurred using the Fatal()
>> + ASL operator, normaly causing a kernel panic. Disabling this option causes such
>> + a condition to be treated like a ordinary ACPI error.
>> +
>> + This setting can also be overridden during boot using the acpi.panic_on_fatal
>> + kernel parameter.
>> +
>> config ACPI_DEBUGGER
>> bool "AML debugger interface"
>> select ACPI_DEBUG
>> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
>> index 05393a7315fe..6375db6d22ea 100644
>> --- a/drivers/acpi/osl.c
>> +++ b/drivers/acpi/osl.c
>> @@ -11,7 +11,9 @@
>>
>> #define pr_fmt(fmt) "ACPI: OSL: " fmt
>>
>> +#include <linux/kconfig.h>
>> #include <linux/module.h>
>> +#include <linux/panic.h>
>> #include <linux/kernel.h>
>> #include <linux/slab.h>
>> #include <linux/mm.h>
>> @@ -70,6 +72,10 @@ static bool acpi_os_initialized;
>> unsigned int acpi_sci_irq = INVALID_ACPI_IRQ;
>> bool acpi_permanent_mmap = false;
>>
>> +static bool panic_on_fatal = IS_ENABLED(CONFIG_ACPI_PANIC_ON_FATAL);
>> +module_param(panic_on_fatal, bool, 0);
>> +MODULE_PARM_DESC(panic_on_fatal, "Trigger a kernel panic when encountering a fatal ACPI error");
>> +
>> /*
>> * This list of permanent mappings is for memory that may be accessed from
>> * interrupt context, where we can't do the ioremap().
>> @@ -1381,9 +1387,20 @@ acpi_status acpi_os_notify_command_complete(void)
>>
>> acpi_status acpi_os_signal(u32 function, void *info)
>> {
>> + struct acpi_signal_fatal_info *fatal_info;
>> +
>> switch (function) {
>> case ACPI_SIGNAL_FATAL:
>> - pr_err("Fatal opcode executed\n");
>> + fatal_info = info;
>> + if (panic_on_fatal) {
>> + panic("Fatal ACPI BIOS error (type 0x%X code 0x%X argument 0x%X)",
>> + fatal_info->type, fatal_info->code, fatal_info->argument);
>> + } else {
>> + pr_emerg("Fatal ACPI BIOS error (type 0x%X code 0x%X argument 0x%X)\n",
>> + fatal_info->type, fatal_info->code, fatal_info->argument);
>> + add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
>> + }
>> +
>> break;
>> case ACPI_SIGNAL_BREAKPOINT:
>> /*
>> --
>> 2.39.5
>>
>>
Powered by blists - more mailing lists