lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0iANSqdP1YQz-XvPsvomVXWttnjYO+u3UBt76-y8gD8vg@mail.gmail.com>
Date: Wed, 28 Jan 2026 21:23:31 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Armin Wolf <W_Armin@....de>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, linux-acpi@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] ACPI: OSL: Panic when encountering a fatal ACPI error

On Wed, Jan 28, 2026 at 3:45 AM Armin Wolf <W_Armin@....de> wrote:
>
> Am 27.01.26 um 22:05 schrieb Rafael J. Wysocki:
>
> > On Tue, Jan 27, 2026 at 7:32 AM Armin Wolf <W_Armin@....de> wrote:
> >> The ACPI spec states that the operating system should respond
> >> to a fatal ACPI error by "performing a controlled OS shutdown in
> >> a timely fashion". Windows complies with this requirement by entering
> >> a BSOD upon executing the "Fatal()" ASL opcode, a implementation
> >> detail that is used by some firmware implementations for signaling
> >> fatal hardware errors.
> >>
> >> Comply with the ACPI specification by triggering a kernel panic
> >> when ACPICA signals a fatal ACPI error.
> > I'm not sure if a kernel panic really counts as "a controlled OS shutdown".
> >
> > Shouldn't this be treated in a similar way to crossing a critical
> > thermal trip point?
> >
> > Also, is there a reason beyond "follow Windows" to do this?
>
> Some firmware implementation use Fatal () to signal fatal hardware errors.
> The ThinkPad X1 Yoga Gen 8 for example has something like this:
>
> Method (_Q7F, 0, NotSerialized)  // _Qxx: EC Query, xx=0x00-0xFF
> {
>         Fatal (0x01, 0x80010000, 0x00036C4E)
> }
>
> I strongly suspect that the EC issues query 0x7F when encountering a fatal internal error.
> Trying to perform an orderly shutdown in this case might cause the whole system to hang
> or to respond erratically.

So if the kernel panics in response to this, it is kind of hard to
diagnose anything.

Also, is panicking really worse than trying to carry out an orderly
shutdown and possibly crashing if that doesn't go well?

> The other use cases for Fatal () that i know of are:
> - failing if some PCI components refuse to resume (MacBookPro 16)
> - failing if a WMI-ACPI device is being accessed while still being disabled (Windows WMI-ACPI example code)
> - failing if the hardware resource configuration is invalid (ThinkPad T410, among others)

None of the above qualifies as a reason for a kernel panic IMV.

> In all those cases the firmware developers assume that Fatal () causes the operating system to panic.
> This seems to be true even on MacOS: https://discussions.apple.com/thread/250384018

Mac is a vertical platform and the OS probably has a much better
reason to trust its firmware than on an average PC.

> We thus should threat the Fatal () opcode like an assert() statement that is used to signal
> when the firmware has encountered a problem it cannot handle. The only way to safely "recover"
> from this is a kernel panic, as a normal shutdown relies on the ACPI firmware still being
> operational.

Well, not necessarily and to some extent.  It may not require the EC
to be operational, for instance.

> >> Users can still disable
> >> this behavior by using the acpi.panic_on_fatal kernel option to
> >> work around firmware bugs.
> > This is not enough.  You are talking about throwing away user data if
> > the firmware asks the kernel to do so.
>
> The kernel also throws away user data if the hardware itself signals a fatal error.

AML is not hardware and as a rule it is developed by less competent people.

> Why should we react differently when the ACPI BIOS does the same, especially when
> MacOS and Windows do exactly that?
>
> I agree that being able to differentiate between fatal errors and recoverable errors
> would be much better, but sadly that is not how most of the ACPI implementations out
> there where designed to work.

Somebody at one point decided to handle Fatal() in Linux by printing a
message and you are asserting that they were wrong because the
designers of other OSes apparently disagree with that.

Overall this is not super-convincing to me.

> >> Link: https://uefi.org/specs/ACPI/6.6/19_ASL_Reference.html#fatal-fatal-error-check
> >> Signed-off-by: Armin Wolf <W_Armin@....de>
> >> ---
> >> Changes since v1:
> >> - use IS_ENABLED() for checking the presence of CONFIG_ACPI_PANIC_ON_FATAL
> >> ---
> >>   .../admin-guide/kernel-parameters.txt         |  9 +++++++++
> >>   drivers/acpi/Kconfig                          | 11 +++++++++++
> >>   drivers/acpi/osl.c                            | 19 ++++++++++++++++++-
> >>   3 files changed, 38 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> >> index 1058f2a6d6a8..140bb239857f 100644
> >> --- a/Documentation/admin-guide/kernel-parameters.txt
> >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >> @@ -187,6 +187,15 @@ Kernel parameters
> >>                          unusable.  The "log_buf_len" parameter may be useful
> >>                          if you need to capture more output.
> >>
> >> +       acpi.panic_on_fatal=    [ACPI]
> >> +                       {0 | 1}
> >> +                       Causes the kernel to panic when the ACPI bytecode signals
> >> +                       a fatal error. The default value of this setting can
> >> +                       be configured using CONFIG_ACPI_PANIC_ON_FATAL.
> >> +                       Overriding this value should only be done for diagnosing
> >> +                       ACPI firmware problems, as some firmware implementations
> >> +                       use this mechanism to signal fatal hardware errors.
> >> +
> >>          acpi_enforce_resources= [ACPI]
> >>                          { strict | lax | no }
> >>                          Check for resource conflicts between native drivers
> >> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> >> index df0ff0764d0d..7139e7d8ac25 100644
> >> --- a/drivers/acpi/Kconfig
> >> +++ b/drivers/acpi/Kconfig
> >> @@ -65,6 +65,17 @@ config ACPI_THERMAL_LIB
> >>          depends on THERMAL
> >>          bool
> >>
> >> +config ACPI_PANIC_ON_FATAL
> >> +       bool "Panic on fatal ACPI error"
> >> +       default y
> >> +       help
> >> +         The ACPI bytecode can signal that a fatal error has occurred using the Fatal()
> >> +         ASL operator, normaly causing a kernel panic. Disabling this option causes such
> >> +         a condition to be treated like a ordinary ACPI error.
> >> +
> >> +         This setting can also be overridden during boot using the acpi.panic_on_fatal
> >> +         kernel parameter.
> >> +
> >>   config ACPI_DEBUGGER
> >>          bool "AML debugger interface"
> >>          select ACPI_DEBUG
> >> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> >> index 05393a7315fe..6375db6d22ea 100644
> >> --- a/drivers/acpi/osl.c
> >> +++ b/drivers/acpi/osl.c
> >> @@ -11,7 +11,9 @@
> >>
> >>   #define pr_fmt(fmt) "ACPI: OSL: " fmt
> >>
> >> +#include <linux/kconfig.h>
> >>   #include <linux/module.h>
> >> +#include <linux/panic.h>
> >>   #include <linux/kernel.h>
> >>   #include <linux/slab.h>
> >>   #include <linux/mm.h>
> >> @@ -70,6 +72,10 @@ static bool acpi_os_initialized;
> >>   unsigned int acpi_sci_irq = INVALID_ACPI_IRQ;
> >>   bool acpi_permanent_mmap = false;
> >>
> >> +static bool panic_on_fatal = IS_ENABLED(CONFIG_ACPI_PANIC_ON_FATAL);
> >> +module_param(panic_on_fatal, bool, 0);
> >> +MODULE_PARM_DESC(panic_on_fatal, "Trigger a kernel panic when encountering a fatal ACPI error");
> >> +
> >>   /*
> >>    * This list of permanent mappings is for memory that may be accessed from
> >>    * interrupt context, where we can't do the ioremap().
> >> @@ -1381,9 +1387,20 @@ acpi_status acpi_os_notify_command_complete(void)
> >>
> >>   acpi_status acpi_os_signal(u32 function, void *info)
> >>   {
> >> +       struct acpi_signal_fatal_info *fatal_info;
> >> +
> >>          switch (function) {
> >>          case ACPI_SIGNAL_FATAL:
> >> -               pr_err("Fatal opcode executed\n");
> >> +               fatal_info = info;
> >> +               if (panic_on_fatal) {
> >> +                       panic("Fatal ACPI BIOS error (type 0x%X code 0x%X argument 0x%X)",
> >> +                             fatal_info->type, fatal_info->code, fatal_info->argument);
> >> +               } else {

I could be convinced to do the part below, maybe except for the
message that is not super-precise (something like "Fatal error while
evaluating ACPI control method" would be better IMO and you could
print the error info in a separate message).

Turning the admin's attention to this is a good idea IMV.

The panic part, not so much.

> >> +                       pr_emerg("Fatal ACPI BIOS error (type 0x%X code 0x%X argument 0x%X)\n",
> >> +                                fatal_info->type, fatal_info->code, fatal_info->argument);
> >> +                       add_taint(TAINT_FIRMWARE_WORKAROUND, LOCKDEP_STILL_OK);
> >> +               }
> >> +
> >>                  break;
> >>          case ACPI_SIGNAL_BREAKPOINT:
> >>                  /*
> >> --

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ