linux-kernel - Re: RCU hang on cpu re-hotplug with 2.6.27rc8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20081007030822.GC6820@linux.vnet.ibm.com>
Date:	Mon, 6 Oct 2008 20:08:22 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	mingo@...e.hu, linux-kernel@...r.kernel.org, rjw@...k.pl,
	dipankar@...ibm.com, tglx@...uxtronix.de
Subject: Re: RCU hang on cpu re-hotplug with 2.6.27rc8

On Tue, Oct 07, 2008 at 01:28:37AM +0200, Andi Kleen wrote:
> [modifying subject]
> 
> On Mon, Oct 06, 2008 at 04:12:20PM +0200, Andi Kleen wrote:
> > [Rafael, something for the regression list]
> > 
> > While testing cpu hotunplug/hotreplug (first 
> > setting two CPUs to offline and then to online again) on a 16 thread machine
> > with 2.6.27rc8 the first 
> > 
> > # echo 1 > ./devices/system/cpu/cpu14/online
> > 
> > after hotunplug deadlocked somewhere in the scheduler:
> 
> I let it run for longer and I ended up with more and more processes
> stuck in synchronize_rcu(). No more backtraces because the system
> has no console and is now not able to write to disk anymore.
> 
> So it seems like there's something broken with RCU & cpu hotplug
> in 2.6.28rc8. cc Paul.
> 
> It's probably not the scheduler, sorry for blaming it earlier.

Could you please try the patch at the following URL (from Thomas
Gleixner)?

http://www.rdrop.com/users/paulmck/patches/2.6.27-rc7-tglx-timer-1.patch

This fixed some CPU hotplug hangs that I was seeing in 2.6.27-rc7 and
-rc8.  Alternatively, try 2.6.27-rc9, which seems to include Thomas's
patch.

							Thanx, Paul

> -Andi
> 
> > bash          D 00000000ffffcb5b     0  4683   4671
> >  ffff8804bc583c68 0000000000000086 ffff8804bc9d8640 0000000000000296
> >  ffff8804bdd34730 ffff8804be6fc090 ffff8804bdd34978 0000000c805a1e2a
> >  ffff8804be4fd780 ffffffff802298b4 ffffffff808acd98 ffff88027d0b1168
> > Call Trace:
> >  [<ffffffff802298b4>] __dequeue_entity+0x25/0x68
> >  [<ffffffff805a1b4b>] schedule_timeout+0x1e/0xad
> >  [<ffffffff8022a11f>] __disable_runtime+0x57/0x155
> >  [<ffffffff8025cc47>] cpupri_set+0xbe/0xcd
> >  [<ffffffff805a19b3>] wait_for_common+0xcd/0x131
> >  [<ffffffff8022c918>] default_wake_function+0x0/0xe
> >  [<ffffffff80241f3a>] synchronize_rcu+0x30/0x36
> >  [<ffffffff80241fac>] wakeme_after_rcu+0x0/0xc
> >  [<ffffffff8022dab6>] partition_sched_domains+0x9b/0x1dd
> >  [<ffffffff8022dc26>] update_sched_domains+0x2e/0x35
> >  [<ffffffff805a5297>] notifier_call_chain+0x29/0x4c
> >  [<ffffffff8059ef76>] _cpu_up+0xd0/0x10a
> >  [<ffffffff8059f004>] cpu_up+0x54/0x61
> >  [<ffffffff805837d5>] store_online+0x43/0x67
> >  [<ffffffff802c51e9>] sysfs_write_file+0xd2/0x110
> >  [<ffffffff8028656b>] vfs_write+0xad/0x136
> >  [<ffffffff802869f3>] sys_write+0x45/0x6e
> >  [<ffffffff8020b22b>] system_call_fastpath+0x16/0x1b
> > 
> > It just hung forever, but the machine was otherwise fully functional.
> > 
> > This was without frame pointers so the backtrace presumably has
> > some garbage. Haven't looked too closely.
> > 
> > -Andi
> > 
> > -- 
> > ak@...ux.intel.com
> 
> -- 
> ak@...ux.intel.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/