lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 10 Jun 2011 14:52:24 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Justin Piszcz <jpiszcz@...idpixels.com>
cc:	LKML <linux-kernel@...r.kernel.org>,
	Alan Piszcz <ap@...arrain.com>, Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: 2.6.39: crash w/threadirqs option enabled

On Fri, 10 Jun 2011, Justin Piszcz wrote:
> On Fri, 20 May 2011, Thomas Gleixner wrote:
> Crashed again and it rebooted too:
> 
> reboot   system boot  2.6.39             Thu Jun  9 23:58 - 04:05  (04:06)
> user1      pts/0        X                Thu Jun  9 19:25 - 19:30  (00:04)
> user1      pts/10       X                Thu Jun  9 18:23 - crash  (05:35)
> 
> Any thoughts on what could be causing this?
> Should I go back to 2.6.38?

If you remove the threadirqs option from the commandline it does not
happen, right?

Can you try the following patch ?

Thanks,

	tglx
---
commit fd8a7de177b6f56a0fc59ad211c197a7df06b1ad
Author: Thomas Gleixner <tglx@...utronix.de>
Date:   Tue Jul 20 14:34:50 2010 +0200

    x86: cpu-hotplug: Prevent softirq wakeup on wrong CPU
    
    After a newly plugged CPU sets the cpu_online bit it enables
    interrupts and goes idle. The cpu which brought up the new cpu waits
    for the cpu_online bit and when it observes it, it sets the cpu_active
    bit for this cpu. The cpu_active bit is the relevant one for the
    scheduler to consider the cpu as a viable target.
    
    With forced threaded interrupt handlers which imply forced threaded
    softirqs we observed the following race:
    
    cpu 0                         cpu 1
    
    bringup(cpu1);
                                  set_cpu_online(smp_processor_id(), true);
    		              local_irq_enable();
    while (!cpu_online(cpu1));
                                  timer_interrupt()
                                    -> wake_up(softirq_thread_cpu1);
                                         -> enqueue_on(softirq_thread_cpu1, cpu0);
    
                                                                            ^^^^
    
    cpu_notify(CPU_ONLINE, cpu1);
      -> sched_cpu_active(cpu1)
         -> set_cpu_active((cpu1, true);
    
    When an interrupt happens before the cpu_active bit is set by the cpu
    which brought up the newly onlined cpu, then the scheduler refuses to
    enqueue the woken thread which is bound to that newly onlined cpu on
    that newly onlined cpu due to the not yet set cpu_active bit and
    selects a fallback runqueue. Not really an expected and desirable
    behaviour.
    
    So far this has only been observed with forced hard/softirq threading,
    but in theory this could happen without forced threaded hard/softirqs
    as well. It's probably unobservable as it would take a massive
    interrupt storm on the newly onlined cpu which causes the softirq loop
    to wake up the softirq thread and an even longer delay of the cpu
    which waits for the cpu_online bit.
    
    Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
    Reviewed-by: Peter Zijlstra <peterz@...radead.org>
    Cc: stable@...nel.org # 2.6.39

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 33a0c11..9fd3137 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -285,6 +285,19 @@ notrace static void __cpuinit start_secondary(void *unused)
 	per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
 	x86_platform.nmi_init();
 
+	/*
+	 * Wait until the cpu which brought this one up marked it
+	 * online before enabling interrupts. If we don't do that then
+	 * we can end up waking up the softirq thread before this cpu
+	 * reached the active state, which makes the scheduler unhappy
+	 * and schedule the softirq thread on the wrong cpu. This is
+	 * only observable with forced threaded interrupts, but in
+	 * theory it could also happen w/o them. It's just way harder
+	 * to achieve.
+	 */
+	while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
+		cpu_relax();
+
 	/* enable local interrupts */
 	local_irq_enable();
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ