[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1189713695.3974.23.camel@chaos>
Date: Thu, 13 Sep 2007 22:01:35 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Pavel Machek <pavel@....cz>
Cc: "Rafael J. Wysocki" <rjw@...k.pl>,
Jeff Chua <jeff.chua.linux@...il.com>, rusty@...tycorp.com.au,
vatsa@...ibm.com, zwane@....linux.org.uk,
kernel list <linux-kernel@...r.kernel.org>,
Len Brown <lenb@...nel.org>
Subject: Re: cpu hotplug support broken in 2.6.23-rc3
On Tue, 2007-09-04 at 09:27 +0200, Pavel Machek wrote:
> > On Mon, 2007-09-03 at 12:19 +0200, Rafael J. Wysocki wrote:
> > > > Ok, so it gets weirder. I have now machine in "hung" state; other
> > > > consoles still work, but there are no timers - sleep 1 hangs forever.
> > > >
> > > > sysrq-t shows kstopmachine hung in hrtimer_try_to_cancel.
> > > >
> > > > So I indeed suspect difference-in-kconfig to trigger this, and will
> > > > try disabling noidlehz.
> > >
> > > I would unset CONFIG_HIGH_RES_TIMERS for starters.
> > >
> > > Well, I guess Thomas should know about that. ;-)
> >
> > What was the last known to work version ?
>
> I'm afraid I only turned on HIGH_RES_TIMERS in 2.6.23-rc1
> timeframe... so I'm not sure if it ever worked for me.
>
> I can confirm it is working in 2.6.23-rc5 with highres disabled, and
> broken with highres enabled. NOHZ turns "waits for keypress during
> unplug/replug" into "just plain hangs".
Ok, I can reproduce it and I tracked down what happens:
When the CPU goes offline, the clock event source for this CPU (lapic)
is removed from the clock events framework. This also clears the
information that the CPU is using C-States which stop the local APIC
timer.
Now you put the CPU online again and the local APIC timer is used, but
the C-State information is not evaluated again in ACPI. This means that
the clock events code does not know that the APIC might stop. In the
worst case this will happen and make the CPU wait for timer interrupts
forever.
The problem only appears when you are on battery (c3/c4 available) or on
those broken machines, where C2 is in reality C3 (e.g. akpm's VAIO)
I have an yet untested fix, which preserves the broadcast state across
the offline state, but Len is looking into it as well, whether we can
just reevaluate the power states (and the broadcast flags) when a cpu
becomes online again. If Len can do that easily for 2.6.23, I'd prefer
that.
tglx
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists