[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <200807241727.41715.trenn@suse.de>
Date: Thu, 24 Jul 2008 17:27:40 +0200
From: Thomas Renninger <trenn@...e.de>
To: Arjan van de Ven <arjan@...ux.intel.com>
Cc: "linux-acpi" <linux-acpi@...r.kernel.org>,
"Moore, Robert" <robert.moore@...el.com>,
"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
Andi Kleen <ak@...ux.intel.com>, Len Brown <lenb@...nel.org>,
Christian Kornacker <ckornacker@...e.de>
Subject: ACPI OSI disaster on latest HP laptops - critical temperature shutdowns
I found this BIOS bug some days ago.
The positive with this one is, that it nicely shows the
need of some things I lately came up with
(point 1. and 2., 3. and 4. are further suggestions):
1) Do not be transparent to Windows in ACPI OSI parts
-> and do not fake to be Windows as long term goal
2) Document _OSI BIOS developer usage in
Documentation/acpi/known_bios_osi_workarounds
3) Linuxfirmwarekit needs kernel support
4) ACPI AML functionality to report errors to the OS
The problem:
HP extensively makes use of ACPI thermal zones.
It seems they hit a bug in Vista which probably caused their
machines to be shut down through a critical temperature event.
They now workaround that Vista bug by returning zero for _CRT
(which is the critical temperature in Kelvin * 10).
So they return -273 degree Celsius which leads to a critical
temperature shutdown as soon as the ACPI thermal driver is loaded.
This is in short the corresponding ACPI BIOS code:
# BIOS checks which OS is running (most parts cut off)
# Linux is returning true for all but not for "Windows 2006 SP1"
# (Vista SP1) and not for "Linux"
...
If (_OSI ("Windows 2001 SP3"))
{
Store (0x12, OSTB)
Store (0x12, TPOS)
}
If (_OSI ("Windows 2006"))
{
Store (0x40, OSTB)
Store (0x40, TPOS)
}
If (_OSI ("Windows 2006 SP1"))
{
Store (0x41, OSTB)
Store (0x40, TPOS)
}
If (_OSI ("Linux"))
{
Store (One, LINX)
Store (0x80, OSTB)
Store (0x80, TPOS)
}
# Valid critical/hot temperature: 105 (0x69)
Name (TPC, 0x69)
...
Method (_HOT, 0, Serialized)
{
# Match for Vista only, not for Vista SP1 !
!!! If (LEqual (TPOS, 0x40))
{
Return (Add (0x0AAC, Multiply (TPC, 0x0A)))
}
Else
{
Return (Zero)
}
}
Method (_CRT, 0, Serialized)
{
# Returns valid values for all Windows version before Vista
!!! If (LLess (TPOS, 0x40))
{
# This is the valid one: 105 C -> (105 * 10) + 2732 (Kelvin * 10)
Return (Add (0x0AAC, Multiply (TPC, 0x0A)))
}
Else
{
# This is returned on Windows Vista
Return (Zero)
}
}
----------------------
This is the fix for this from Arjan:
ACPI: Reject below-freezing temperatures as invalid critical temperatures
My laptop thinks that it's a good idea to give -73C as the critical
CPU temperature.... which isn't the best thing since it causes a shutdown
right at bootup.
Temperatures below freezing are clearly invalid critical thresholds
so just reject these as such.
commit a39a2d7c72b358c6253a2ec28e17b023b7f6f41c
@@ -364,10 +364,17 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
if (flag & ACPI_TRIPS_CRITICAL) {
status = acpi_evaluate_integer(tz->device->handle,
"_CRT", NULL, &tz->trips.critical.temperature);
- if (ACPI_FAILURE(status)) {
+ /*
+ * Treat freezing temperatures as invalid as well; some
+ * BIOSes return really low values and cause reboots at startup.
+ * Below zero (Celcius) values clearly aren't right for sure..
+ * ... so lets discard those as invalid.
+ */
+ if (ACPI_FAILURE(status) ||
+ tz->trips.critical.temperature <= 2732) {
tz->trips.critical.flags.valid = 0;
ACPI_EXCEPTION((AE_INFO, status,
- "No critical threshold"));
+ "No or invalid critical threshold"));
return -ENODEV;
} else {
tz->trips.critical.flags.valid = 1;
----------------------
What are the consequences of:
1) The fact that BIOS vendors have to fix Windows bugs/erratas through
ACPI _OSI hooks (this is nearly the only way BIOS vendors do use the
_OSI interface)
2) The current Linux _OSI implementation being transparent to Windows
3) The invalid critical temperature is simply ignored and the trip
point not shown to userspace
1) One must assume that such a Vista or Vista SP1 only bug workaround has to
be spread by HP to all of their BIOSes, thus killing all ACPI aware Linux
kernels to work.
2) Vendors who want to provide Linux and Windows support
have to provide a separate BIOS or patch the Linux kernel so that they
need not to run Windows errata workarounds through _OSI hooks.
3) This Vista bug can be workarounded by checking for zero.
Things could get more complex.
Linux cannot implement all Windows bugs of all Windows versions on
long-term.
4) HP certifies (at least some of) their laptops to work with distributions.
The above patch absorbs the BIOS bug, making it impossible for the current
Linuxfirmwarekit implementation to detect it.
Above BIOS update could have been rejected by certification -> needs
a kernel facility to report BIOS bugs. Or at least the certified
distribution could have been patched along with with this BIOS update/
breakage.
5) It is just a matter of time until Windows version specific ACPI bugs are
workarounded in BIOSes in the server area also.
Therefore some suggestions (from above):
1) As a long term goal Linux should not be transparent to Windows.
Nearly all _OSI conditions where ACPI code is checking which OS is
running, do implement Windows bug workarounds. Vendors are not able
to fix the Windows implementation, therefore they have to do it in
BIOS. While the next Windows generation might have fixed the cause,
Linux tries to implement (be compatible with) all Windows bugs.
2) Document Windows bugs workarounded via _OSI in
Documentation/acpi/known_osi_hooks
3) Document Linux _OSI behavior. No ACPI BIOS developer is aware that
Linux violates the Spec. All latest ACPI BIOSes do check for "Linux"
as running OS, but Linux does not return true for the call.
I have started to document current _OSI behavior on Linux. I then
realized it might be a good idea to extend it a bit and talk about
general ACPI BIOS problems on Linux. It's here: ftp://ftp.suse.com/pub/people/trenn/ACPI_BIOS_on_Linux_guide/acpi_guideline_for_vendors.pdf
Comments for enhancements, additions, etc. are appreciated.
I'll anounce that separately.
4) Provide a facility to tell userspace about BIOS bugs.
The:
FIRMWARE_BUG(severity, "Message");
interface idea I mentioned recently in an unrelated thread.
The idea is something similar to printk, but to be able use it intensively
on each possible bogus value returned from BIOS (also for documentation)
and to be able to compile it out to not waste that much memory on
production kernels.
At the end is a patch that extends Arjan's patch by also checking return
values for hot (is an issue with HP Bioses already), passive and active
trip points, in wrong BIOS value case we want to inform userspace
that something in BIOS is bogus, so that HW vendors who care about Linux
see that something could go wrong.
5) Something ACPI specific, maybe Intel is able to push this into the
ACPI specification on (very) long-term:
ACPI BIOS developers cannot report error conditions.
Therefore you often end up in invalid values as they have to return
a value if a function is provided even
they know it does not make any sense at all.
Ideas:
1) Provide an error object similar to the debug object.
-> Just to have something in the logs
2) Add error values to each or sets of ACPI function
-> cumbersome
3) Introduce return_error statement which can be used instead of
return. If it is used, the kernel must ignore the value
of the function.
-> would help a lot, similar functionality like 2., but easier
Thomas
This patch also fixes hot, passive and active trip points in case
zero is returned as temperature invalidating the trip point.
Hopefully this can be reported as a firmware bug soon.
diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c
index 84c795f..f6344f6 100644
--- a/drivers/acpi/thermal.c
+++ b/drivers/acpi/thermal.c
@@ -400,7 +400,8 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
if (flag & ACPI_TRIPS_HOT) {
status = acpi_evaluate_integer(tz->device->handle,
"_HOT", NULL, &tz->trips.hot.temperature);
- if (ACPI_FAILURE(status)) {
+ if (ACPI_FAILURE(status) ||
+ tz->trips.hot.temperature <= 2732) {
tz->trips.hot.flags.valid = 0;
ACPI_DEBUG_PRINT((ACPI_DB_INFO,
"No hot threshold\n"));
@@ -425,7 +426,8 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
"_PSV", NULL, &tz->trips.passive.temperature);
}
- if (ACPI_FAILURE(status))
+ if (ACPI_FAILURE(status) ||
+ tz->trips.passive.temperature <= 2732)
tz->trips.passive.flags.valid = 0;
else {
tz->trips.passive.flags.valid = 1;
@@ -480,7 +482,8 @@ static int acpi_thermal_trips_update(struct acpi_thermal *tz, int flag)
if (flag & ACPI_TRIPS_ACTIVE) {
status = acpi_evaluate_integer(tz->device->handle,
name, NULL, &tz->trips.active[i].temperature);
- if (ACPI_FAILURE(status)) {
+ if (ACPI_FAILURE(status) ||
+ tz->trips.active[i].temperature <= 2732) {
tz->trips.active[i].flags.valid = 0;
if (i == 0)
break;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists