lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1189713695.3974.23.camel@chaos>
Date:	Thu, 13 Sep 2007 22:01:35 +0200
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Pavel Machek <pavel@....cz>
Cc:	"Rafael J. Wysocki" <rjw@...k.pl>,
	Jeff Chua <jeff.chua.linux@...il.com>, rusty@...tycorp.com.au,
	vatsa@...ibm.com, zwane@....linux.org.uk,
	kernel list <linux-kernel@...r.kernel.org>,
	Len Brown <lenb@...nel.org>
Subject: Re: cpu hotplug support broken in 2.6.23-rc3


On Tue, 2007-09-04 at 09:27 +0200, Pavel Machek wrote:
> > On Mon, 2007-09-03 at 12:19 +0200, Rafael J. Wysocki wrote:
> > > > Ok, so it gets weirder. I have now machine in "hung" state; other
> > > > consoles still work, but there are no timers -  sleep 1 hangs forever.
> > > > 
> > > > sysrq-t shows kstopmachine hung in hrtimer_try_to_cancel.
> > > > 
> > > > So I indeed suspect difference-in-kconfig to trigger this, and will
> > > > try disabling noidlehz.
> > > 
> > > I would unset CONFIG_HIGH_RES_TIMERS for starters.
> > > 
> > > Well, I guess Thomas should know about that. ;-)
> > 
> > What was the last known to work version ?
> 
> I'm afraid I only turned on HIGH_RES_TIMERS in 2.6.23-rc1
> timeframe... so I'm not sure if it ever worked for me.
> 
> I can confirm it is working in 2.6.23-rc5 with highres disabled, and
> broken with highres enabled. NOHZ turns "waits for keypress during
> unplug/replug" into "just plain hangs".

Ok, I can reproduce it and I tracked down what happens:

When the CPU goes offline, the clock event source for this CPU (lapic)
is removed from the clock events framework. This also clears the
information that the CPU is using C-States which stop the local APIC
timer.

Now you put the CPU online again and the local APIC timer is used, but
the C-State information is not evaluated again in ACPI. This means that
the clock events code does not know that the APIC might stop. In the
worst case this will happen and make the CPU wait for timer interrupts
forever.

The problem only appears when you are on battery (c3/c4 available) or on
those broken machines, where C2 is in reality C3 (e.g. akpm's VAIO)

I have an yet untested fix, which preserves the broadcast state across
the offline state, but Len is looking into it as well, whether we can
just reevaluate the power states (and the broadcast flags) when a cpu
becomes online again. If Len can do that easily for 2.6.23, I'd prefer
that.

	tglx


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ