lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120905125700.GA5833@localhost>
Date:	Wed, 5 Sep 2012 20:57:00 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Michael Wang <wangyun@...ux.vnet.ibm.com>,
	LKML <linux-kernel@...r.kernel.org>, x86@...nel.org,
	Suresh Siddha <suresh.b.siddha@...el.com>,
	Venkatesh Pallipadi <venki@...gle.com>
Subject: Re: WARNING: cpu_is_offline() at native_smp_send_reschedule()

On Wed, Sep 05, 2012 at 12:54:40PM +0200, Peter Zijlstra wrote:
> On Wed, 2012-09-05 at 12:35 +0800, Michael Wang wrote:
> > > [   10.968565] reboot: machine restart
> > > [   10.983510] ------------[ cut here ]------------
> > > [   10.984218] WARNING: at /c/kernel-tests/src/stable/arch/x86/kernel/smp.c:123 native_smp_send_reschedule+0x46/0x50()
> > > [   10.985880] Pid: 88, comm: kpktgend_0 Not tainted 3.6.0-rc3-00005-gb374aa1 #10
> > > [   10.987185] Call Trace:
> > > [   10.987506]  [<7902f42a>] warn_slowpath_common+0x5a/0x80
> > > [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> > > [   10.987506]  [<7901ee16>] ? native_smp_send_reschedule+0x46/0x50
> > > [   10.987506]  [<7902f4fd>] warn_slowpath_null+0x1d/0x20
> > > [   10.987506]  [<7901ee16>] native_smp_send_reschedule+0x46/0x50
> > 
> > So this cpu try to fire a nohz balance kick ipi to an offline cpu?
> > 
> > May be we are choosing a wrong cpu to kick but that's not the point,
> > what I can't understand is why this cpu could do this kick.
> > 
> > We have nohz_kick_needed() to check whether current cpu should do kick ,
> > and the first condition we need to match is that current cpu should be
> > idle, but the trace show current pid is 88 not 0.
> > 
> > We should add Peter to cc list, may be he will be interested on what
> > happened.
> 
> > > [   10.987506]  [<7905fdad>] trigger_load_balance+0x1bd/0x250
> > > [   10.987506]  [<79056d14>] scheduler_tick+0xd4/0x100
> > > [   10.987506]  [<7903bde5>] update_process_times+0x55/0x70 
> 
> Hmm, added both venki and suresh as they touched it last ;-)
> 
> I suppose you're running a hotplug loop along with your workload?

I would definitely like to add some hotplug tests! However for this
trace, it's simply booting into an ubuntu-core initrd and run the
"reboot" command in some late init.d script.

It seems that the bug was introduced somewhere in v3.3..v3.4. I'm now
running 100 kvms to speedup the bisect progress :)

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ