linux-kernel - Re: Why is max_cstate=1 still needed?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Fri, 11 Mar 2011 15:02:50 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Jiri Slaby <jslaby@...e.cz>
cc:	Len Brown <lenb@...nel.org>,
	linux-pm <linux-pm@...ts.linux-foundation.org>,
	"x86@...nel.org" <x86@...nel.org>,
	Linux kernel mailing list <linux-kernel@...r.kernel.org>,
	Jiri Slaby <jirislaby@...il.com>
Subject: Re: Why is max_cstate=1 still needed?

On Fri, 11 Mar 2011, Jiri Slaby wrote:
> there are still reports against the latest kernels, that people need to
> pass processor/intel_idle.max_cstate=1 to successfully boot the kernel.
> The symptoms are always the same, until the parameter is specified OR
> until the user presses a key, the system won't boot up.
> 
> This started to appear between 2.6.31 and 2.6.34 (possibly a 2.6.33
> regression) and continues to be reported against the latest stable
> 2.6.37.3. For example:
> https://bugzilla.kernel.org/show_bug.cgi?id=15289
> https://bugzilla.novell.com/show_bug.cgi?id=579932
> https://bugzilla.novell.com/show_bug.cgi?id=673589
> https://bugzilla.novell.com/show_bug.cgi?id=675161
> 
> I see that there were some fixes in .38-rc in this bug report (they look
> unrelated):
> https://bugzilla.kernel.org/show_bug.cgi?id=29992
> 
> Should they give .38-rc a try?

Trying does no damage :(

> Any help would be appreciated.

I went through the bug reports briefly. While they all report the same
symptoms (hangs until key pressed) the root cause varies.

   - SMM C1E handler broken (affects AMD only)
   - HPET issues (mostly AMD)
   - The usual ACPI/BIOS madness

On most of those systems nohz=off hides the problem as well as it
prevents deeper power states, so the local apic timer just keeps
ticking and the broadcast via PIT/HPET is not activated. hpet=disable
is another way to work around it.

To be honest we have no real handle on all of this as much of the
wreckage is hidden deep in that black hole of ACPI/BIOS. We grew some
quirks and detection mechanisms over time, but there seems to be a
never ending source of trouble especially as HW vendors seem to add
more power related features into the BIOS. We've seen perf wreckage as
well as some of those abuse performance counters :(

Of course those "features" are only tested against that other OS, some
of them even require a driver counterpart for the other OS. Of course
we have no information about that at all and the HW vendors are
helpful as ever.

Yes, it's sad, but reality.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/