lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Fri, 11 Mar 2011 15:02:50 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Jiri Slaby <jslaby@...e.cz>
cc:	Len Brown <lenb@...nel.org>,
	linux-pm <linux-pm@...ts.linux-foundation.org>,
	"x86@...nel.org" <x86@...nel.org>,
	Linux kernel mailing list <linux-kernel@...r.kernel.org>,
	Jiri Slaby <jirislaby@...il.com>
Subject: Re: Why is max_cstate=1 still needed?

On Fri, 11 Mar 2011, Jiri Slaby wrote:
> there are still reports against the latest kernels, that people need to
> pass processor/intel_idle.max_cstate=1 to successfully boot the kernel.
> The symptoms are always the same, until the parameter is specified OR
> until the user presses a key, the system won't boot up.
> 
> This started to appear between 2.6.31 and 2.6.34 (possibly a 2.6.33
> regression) and continues to be reported against the latest stable
> 2.6.37.3. For example:
> https://bugzilla.kernel.org/show_bug.cgi?id=15289
> https://bugzilla.novell.com/show_bug.cgi?id=579932
> https://bugzilla.novell.com/show_bug.cgi?id=673589
> https://bugzilla.novell.com/show_bug.cgi?id=675161
> 
> I see that there were some fixes in .38-rc in this bug report (they look
> unrelated):
> https://bugzilla.kernel.org/show_bug.cgi?id=29992
> 
> Should they give .38-rc a try?

Trying does no damage :(
 
> Any help would be appreciated.

I went through the bug reports briefly. While they all report the same
symptoms (hangs until key pressed) the root cause varies.

   - SMM C1E handler broken (affects AMD only)
   - HPET issues (mostly AMD)
   - The usual ACPI/BIOS madness

On most of those systems nohz=off hides the problem as well as it
prevents deeper power states, so the local apic timer just keeps
ticking and the broadcast via PIT/HPET is not activated. hpet=disable
is another way to work around it.

To be honest we have no real handle on all of this as much of the
wreckage is hidden deep in that black hole of ACPI/BIOS. We grew some
quirks and detection mechanisms over time, but there seems to be a
never ending source of trouble especially as HW vendors seem to add
more power related features into the BIOS. We've seen perf wreckage as
well as some of those abuse performance counters :(

Of course those "features" are only tested against that other OS, some
of them even require a driver counterpart for the other OS. Of course
we have no information about that at all and the HW vendors are
helpful as ever.

Yes, it's sad, but reality.

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ