linux-kernel - Re: NMI watchdog + NOHZ question

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 24 Jun 2009 02:44:28 -0700 (PDT)
From:	David Miller <davem@...emloft.net>
To:	andi@...stfloor.org
Cc:	linux-kernel@...r.kernel.org, sparclinux@...r.kernel.org
Subject: Re: NMI watchdog + NOHZ question

From: Andi Kleen <andi@...stfloor.org>
Date: Wed, 24 Jun 2009 09:53:42 +0200

>> > Ah you have a one shot timer and it gets rescheduled in the softirq?
>> > If yes why not in doing that directly in the hardirq handler?
>> 
>> Then what's the point of the generic timer code supporting one-shot
>> clock sources? :-)
> 
> Well it would avoid that problem at least (I think based on your
> description). Somehow you need to reschedule the timer before the softirq.
> 
> I guess you could have a generic function that is callable from hardirq
> directly?

Thinking about this some more, the issue I'm hitting has nothing to
do with how the timer fires.

The problem occurs when the cpu goes into NOHZ mode, and the timer
is not firing.  And I suspect x86 would hit this problem too as
currently coded.

Using sparc64 first as a concrete example, the idle loop is essentially:

	while(1) {
		tick_nohz_stop_sched_tick(1);

		while (!need_resched() && !cpu_is_offline(cpu))
			sparc64_yield(cpu);

		tick_nohz_restart_sched_tick();

		preempt_enable_no_resched();
 ...
		schedule();
		preempt_disable();
	}

And on this particular CPU type sparc64_yield() is simply

	touch_nmi_watchdog();

since this cpu doesn't support yielding.

So if we get that 5+ second qla2xxx interrupt storm during the
"while (!need_resched() ..." loop, no matter what we do the NMI
watchdog is going to trigger on us once the qla2xxx firmware
upload is complete.

X86 32-bit's cpu_idle() looks roughly like this:

	while (1) {
		tick_nohz_stop_sched_tick(1);
		while (!need_resched()) {

			check_pgt_cache();
			rmb();

			if (cpu_is_offline(cpu))
				play_dead();

			local_irq_disable();
			/* Don't trace irqs off for idle */
			stop_critical_timings();
			pm_idle();
			start_critical_timings();
		}
		tick_nohz_restart_sched_tick();
		preempt_enable_no_resched();
		schedule();
		preempt_disable();
	}

And similarly to sparc64, if that 5+ second qla2xxx interrupt
sequence happens after the tick_nohz_stop_sched_tick() call
we can run into the same situation.

Because the timer interrupt count is not incrementing, and it won't do
so for at least "5 * nmi_hz".
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/