lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 19 Jul 2012 09:05:41 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Mike Galbraith <efault@....de>
Cc:	linux-kernel@...r.kernel.org,
	linux-rt-users <linux-rt-users@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Carsten Emde <C.Emde@...dl.org>, John Kacur <jkacur@...hat.com>
Subject: Re: [PATCH RT 00/12] [ANNOUNCE] 3.0.36-rt58-rc1 stable review

On Thu, 2012-07-19 at 06:00 +0200, Mike Galbraith wrote:
> On Wed, 2012-07-18 at 18:39 -0400, Steven Rostedt wrote:
> 
> > Please test the patches too.
> 
> Your hotplug stress test script made x3550 M3 box fall over.  It took a
> bit, but down she went.  64 core test box fell over quickly, but that's
> very far from virgin source.. seems to be the same though.

Thanks for the report. I know a few areas in the hotplug code that can
still deadlock (but are hard to hit). But there's no easy fix for them.
Basically, the only thing we can do is redesign cpu hotplug (I think
someone is already trying to do that ;-).

But these patches do fix the main issues of cpu hotplug (albeit, making
the code even uglier).

The panic below isn't telling much. We really need to know what the
other CPUs were up to. This call trace is just telling us that one of
the CPUs is waiting for other CPUs to stop or to finish something up.

-- Steve


> 
> [  255.016043] CPU 1 MCA<0>Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 7
> Pid: 9914, comm: migration/7 Not tainted 3.0.36-rt57 #49
> Call Trace:
>  <NMI>  [<ffffffff814a0f7b>] panic+0x9b/0x1b0
>  [<ffffffff810b0627>] watchdog_overflow_callback+0xd7/0xe0
>  [<ffffffff810c3dad>] __perf_event_overflow+0x9d/0x240
>  [<ffffffff810c066b>] ? perf_event_update_userpage+0x9b/0xe0
>  [<ffffffff810c41a4>] perf_event_overflow+0x14/0x20
>  [<ffffffff81015707>] intel_pmu_handle_irq+0x177/0x230
>  [<ffffffff814a5549>] perf_event_nmi_handler+0x39/0xc0
>  [<ffffffff814a727d>] notifier_call_chain+0x4d/0x70
>  [<ffffffff814a72e3>] __atomic_notifier_call_chain+0x43/0x60
>  [<ffffffff814a7311>] atomic_notifier_call_chain+0x11/0x20
>  [<ffffffff814a734e>] notify_die+0x2e/0x30
>  [<ffffffff814a4699>] default_do_nmi+0x39/0x200
>  [<ffffffff814a4a48>] do_nmi+0x78/0x80
>  [<ffffffff814a44d0>] nmi+0x20/0x30
>  [<ffffffff810a461a>] ? stop_machine_cpu_stop+0x6a/0xe0
>  <<EOE>>  [<ffffffff810a47f4>] cpu_stopper_thread+0xf4/0x1d0
>  [<ffffffff810a45b0>] ? wait_for_stop_done+0xa0/0xa0
>  [<ffffffff814a1397>] ? __schedule+0x2c7/0x630
>  [<ffffffff810a4700>] ? cpu_stop_queue_work+0x70/0x70
>  [<ffffffff810a4700>] ? cpu_stop_queue_work+0x70/0x70
>  [<ffffffff810702c6>] kthread+0xa6/0xb0
>  [<ffffffff81056328>] ? do_exit+0x278/0x450
>  [<ffffffff810016b2>] ? __switch_to+0xf2/0x370
>  [<ffffffff81040f15>] ? finish_task_switch+0x55/0xd0
>  [<ffffffff814aa6e4>] kernel_thread_helper+0x4/0x10
>  [<ffffffff81070220>] ? __init_kthread_worker+0x50/0x50
>  [<ffffffff814aa6e0>] ? gs_change+0x13/0x13
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ