lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130510213851.GC9358@somewhere>
Date:	Fri, 10 May 2013 23:38:52 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Borislav Petkov <bp@...en8.de>
Cc:	Jiri Kosina <jkosina@...e.cz>, Tony Luck <tony.luck@...el.com>,
	linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123
 native_smp_send_reschedule

On Fri, May 10, 2013 at 06:23:40PM +0200, Borislav Petkov wrote:
> On Fri, May 10, 2013 at 05:43:50PM +0200, Frederic Weisbecker wrote:
> > So either interrupts are spuriously enabled early, or ts->tick_stopped
> > is not correctly initialized.
> 
> Hmm, it can't be interrupts disabled because add_timer_on() does
> spin_lock_irqsave() before calling wake_up_nohz_cpu()... Maybe something
> like the below could help check this...

Right, by the time we call wake_up_nohz_cpu(), irqs can't be enabled, but may
be they were enabled before in start_secondary().

> 
> Although AFAICT, we're enabling interrupts much later in
> start_secondary, even after we've set the bit in the online mask.

Right.

> 
> ---
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 9c73b51817e4..1b679b0fa57a 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -201,6 +201,8 @@ static void __cpuinit smp_callin(void)
>  	 */
>  	setup_vector_irq(smp_processor_id());
>  
> +	WARN_ON(!irqs_disabled());
> +

The problem is that it doesn't catch issues with irqs that have been enabled
before in start_secondary(), then re-disabled somewhow. Warning on offline CPU from the place 
that disables the tick should catch the issue.

Jiri, could you test the following patch? I also added some code to dump
the value of ts->tick_stopped, in case it's not well initialized or something.

Note this may give you spurious warning when you unplug a CPU or when you shutdown the
system. But it's interesting if it dumps something in the boot.

Thanks!

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 58453b8..9853125 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -616,8 +616,17 @@ static bool wake_up_full_nohz_cpu(int cpu)
 {
 	if (tick_nohz_full_cpu(cpu)) {
 		if (cpu != smp_processor_id() ||
-		    tick_nohz_tick_stopped())
+		    tick_nohz_tick_stopped()) {
+			if (!cpu_online(cpu)) {
+				static int printed = 0;
+				if (!printed) {
+					printk("src: %d dst: %d stopped: %d\n", cpu, smp_processor_id(), tick_nohz_tick_stopped());
+					dump_stack();
+					printed = 1;
+				}
+			}
 			smp_send_reschedule(cpu);
+		}
 		return true;
 	}
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index bc67d42..abfa8c3 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -650,6 +650,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
 
 			ts->last_tick = hrtimer_get_expires(&ts->sched_timer);
 			ts->tick_stopped = 1;
+			WARN_ON_ONCE(!cpu_online(cpu));
 			trace_tick_stop(1, " ");
 		}
 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ