[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1808011812160.1835@nanos.tec.linutronix.de>
Date: Wed, 1 Aug 2018 19:46:10 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Frederic Weisbecker <frederic@...nel.org>
cc: LKML <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Anna-Maria Gleixner <anna-maria@...utronix.de>
Subject: Re: [PATCH] nohz: Fix missing tick reprog while interrupting inline
timer softirq
On Wed, 1 Aug 2018, Frederic Weisbecker wrote:
> Before updating the full nohz tick or the idle time on IRQ exit, we
> check first if we are not in a nesting interrupt, whether the inner
> interrupt is a hard or a soft IRQ.
>
> There is a historical reason for that: the dyntick idle mode used to
> reprogram the tick on IRQ exit, after softirq processing, and there was
> no point in doing that job in the outer nesting interrupt because the
> tick update will be performed through the end of the inner interrupt
> eventually, with even potential new timer updates.
>
> One corner case could show up though: if an idle tick interrupts a softirq
> executing inline in the idle loop (through a call to local_bh_enable())
Where does this happen? Why is anything in the idle loop doing a
local_bh_disable/enable() pair?
Or are you talking about NOHZ FULL and arbitrary task context?
> after we entered in dynticks mode, the IRQ won't reprogram the tick
> because it assumes the softirq executes on an inner IRQ-tail. As a
> result we might put the CPU in sleep mode with the tick completely
> stopped whereas a timer can still be enqueued. Indeed there is no tick
> reprogramming in local_bh_enable(). We probably asssumed there was no bh
> disabled section in idle, although there didn't seem to be debug code
> ensuring that.
>
> Nowadays the nesting interrupt optimization still stands but only concern
> full dynticks. The tick is stopped on IRQ exit in full dynticks mode
> and we want to wait for the end of the inner IRQ to reprogramm the tick.
> But in_interrupt() doesn't make a difference between softirqs executing
> on IRQ tail and those executing inline. What was to be considered a
> corner case in dynticks-idle mode now becomes a serious opportunity for
> a bug in full dynticks mode: if a tick interrupts a task executing
> softirq inline, the tick reprogramming will be ignored and we may exit
> to userspace after local_bh_enable() with an enqueued timer that will
> never fire.
>
> To fix this, simply keep reprogramming the tick if we are in a hardirq
> interrupting softirq. We can still figure out a way later to restore
> this optimization while excluding inline softirq processing.
I'm not really happy with that 'fix' because what happens if:
....
local_bh_enable()
do_softirq()
--> interrupt()
tick_nohz_irq_exit();
arm_timer();
So if that new timer is the only one on the CPU, what is going to arm the
timer hardware which was just switched off in tick_nohz_irq_exit()?
I haven't looked deep enough, but a simple unconditional call to
tick_irq_exit() at the end of do_softirq() might do the trick.
Thanks,
tglx
Powered by blists - more mailing lists