linux-kernel - Re: [PATCH 1/1] timers/nohz: Last resort update jiffies on nohz

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210915174440.GU4156@paulmck-ThinkPad-P17-Gen-1>
Date:   Wed, 15 Sep 2021 10:44:40 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH 1/1] timers/nohz: Last resort update jiffies on nohz_full
 IRQ entry

On Wed, Sep 15, 2021 at 04:23:03PM +0200, Frederic Weisbecker wrote:
> When at least one CPU runs in nohz_full mode, a dedicated timekeeper CPU
> is guaranteed to stay online and to never stop its tick.
> 
> Meanwhile on some rare case, the dedicated timekeeper may be running
> with interrupts disabled for a while, such as in stop_machine.
> 
> If jiffies stop being updated, a nohz_full CPU may end up endlessly
> programming the next tick in the past, taking the last jiffies update
> monotonic timestamp as a stale base, resulting in an tick storm.
> 
> Here is a scenario where it matters:
> 
> 0) CPU 0 is the timekeeper and CPU 1 a nohz_full CPU.
> 
> 1) A stop machine callback is queued to execute somewhere.
> 
> 2) CPU 0 reaches MULTI_STOP_DISABLE_IRQ while CPU 1 is still in
>    MULTI_STOP_PREPARE. Hence CPU 0 can't do its timekeeping duty. CPU 1
>    can still take IRQs.
> 
> 3) CPU 1 receives an IRQ which queues a timer callback one jiffy forward.
> 
> 4) On IRQ exit, CPU 1 schedules the tick one jiffy forward, taking
>    last_jiffies_update as a base. But last_jiffies_update hasn't been
>    updated for 2 jiffies since the timekeeper has interrupts disabled.
> 
> 5) clockevents_program_event(), which relies on ktime_get(), observes
>    that the expiration is in the past and therefore programs the min
>    delta event on the clock.
> 
> 6) The tick fires immediately, goto 3)
> 
> 7) Tick storm, the nohz_full CPU is drown and takes ages to reach
>    MULTI_STOP_DISABLE_IRQ, which is the only way out of this situation.
> 
> Solve this with unconditionally updating jiffies if the value is stale
> on nohz_full IRQ entry. IRQs and other disturbances are expected to be
> rare enough on nohz_full for the unconditional call to ktime_get() to
> actually matter.
> 
> Reported-and-tested-by: Paul E. McKenney <paulmck@...nel.org>
> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>

Nice, thank you!!!

I am assuming that this goes up through some other channel, but in
the meantime I have pulled it into -rcu for continued testing and
to get it into -next sooner rather than later.

I also could not resist wordsmithing as shown below.

							Thanx, Paul

------------------------------------------------------------------------

commit e603ba78a96dc69f31579908fcee4ae4eedb68ae
Author: Frederic Weisbecker <frederic@...nel.org>
Date:   Wed Sep 15 16:23:03 2021 +0200

    timers/nohz: Last resort update jiffies on nohz_full IRQ entry
    
    When at least one CPU runs in nohz_full mode, a dedicated timekeeper
    CPU is guaranteed to stay online and to never stop its tick.  Except
    that this timekeeper could be spinning with interrupts disabled for an
    extended period of time, for example, when in the midst of a stop-machine
    operation.
    
    This means that jiffies are no longer being updated, which in turn means
    that a nohz_full CPU can endlessly program its next tick in the past,
    which results in an tick storm.
    
    This situation can arise as follows:
    
    0) CPU 0 is the timekeeper and CPU 1 a nohz_full CPU.
    
    1) A stop-machine callback is queued.
    
    2) CPU 0 reaches MULTI_STOP_DISABLE_IRQ while CPU 1 is still in
       MULTI_STOP_PREPARE. Hence CPU 0 can't do its timekeeping duty. CPU 1
       can still take IRQs.
    
    3) CPU 1 receives an IRQ which queues a timer callback one jiffy forward.
    
    4) On IRQ exit, CPU 1 schedules the tick one jiffy forward, taking
       last_jiffies_update as a base. But last_jiffies_update hasn't been
       updated for 2 jiffies since the timekeeper has interrupts disabled.
    
    5) clockevents_program_event(), which relies on ktime_get(), observes
       that the expiration is in the past and therefore programs the min
       delta event on the clock.
    
    6) The tick fires immediately, goto 3), tick storm!
    
    7) The nohz_full CPU 1 makes no forward progress until such time as
       integer overflow causes the tick's expiration to actually be in
       the future, which after the better part of an hour finally allows
       CPU 1 to reach MULTI_STOP_DISABLE_IRQ.
    
    Solve this by unconditionally updating jiffies if the value is stale
    on nohz_full IRQ entry. IRQs and other disturbances are expected to be
    rare enough on nohz_full for the unconditional call to ktime_get() to
    incur negligible overhead.
    
    Reported-and-tested-by: Paul E. McKenney <paulmck@...nel.org>
    Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 322b65d45676..41f470929e99 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -595,7 +595,8 @@ void irq_enter_rcu(void)
 {
 	__irq_enter_raw();
 
-	if (is_idle_task(current) && (irq_count() == HARDIRQ_OFFSET))
+	if (tick_nohz_full_cpu(smp_processor_id()) ||
+	    (is_idle_task(current) && (irq_count() == HARDIRQ_OFFSET)))
 		tick_irq_enter();
 
 	account_hardirq_enter(current);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 6bffe5af8cb1..17a283ce2b20 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1375,6 +1375,13 @@ static inline void tick_nohz_irq_enter(void)
 	now = ktime_get();
 	if (ts->idle_active)
 		tick_nohz_stop_idle(ts, now);
+	/*
+	 * If all CPUs are idle. We may need to update a stale jiffies value.
+	 * Note nohz_full is a special case: a timekeeper is guaranteed to stay
+	 * alive but it might be busy looping with interrupts disabled in some
+	 * rare case (typically stop machine). So we must make sure we have a
+	 * last resort.
+	 */
 	if (ts->tick_stopped)
 		tick_nohz_update_jiffies(now);
 }