linux-kernel - Re: Timer refuses to expire

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20171205233744.GA2453@linux.vnet.ibm.com>
Date:   Tue, 5 Dec 2017 15:37:44 -0800
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     anna-maria@...utronix.de
Cc:     tglx@...utronix.de, linux-kernel@...r.kernel.org
Subject: Re: Timer refuses to expire

On Mon, Dec 04, 2017 at 09:42:08AM -0800, Paul E. McKenney wrote:
> On Fri, Dec 01, 2017 at 10:25:29AM -0800, Paul E. McKenney wrote:
> > Hello, Anna-Maria,
> > 
> > It turned out that one set of problems was due to NO_HZ_FULL issues,
> > which are addressed with a pair of patches.  However, there is still one
> > case of a timer refusing to expire in rcutorture's TREE01 test scenario.
> > The takes from two to four hours to trigger, and the most recent one
> > gives the following diagnostic (see patch at the very end of this email):
> > 
> > [13166.127458] schedule_timeout: Waylayed timer base->clk: 0x100c40004 jiffies: 0x100c4524e base->next_expiry: 0x100c40004 timer->flags: 0x1000003 timer->expires 0x100c40003 idx: 4 idx_now: ea base->pending_map 000000000000001000000000001080000000000000040000000002000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000
> > 
> > The message appears any time a timer for less than five jiffies takes
> > more than eight seconds to expire.  In all cases, this expiry did not
> > happen naturally, but rather via an unsolicited wakeup from the RCU CPU
> > stall code.  If I am interpreting this message correctly, base->clk
> > has advanced past this timer, and the timer is correctly flagged in
> > base->pending_map.  This seems like part of the problem because the
> > timer's timeout was only three jiffies.  However, base->clk has not
> > advanced for more than 20 seconds, which seems like another problem.
> > 
> > What additional diagnostics would be helpful?  I can capture data
> > at the beginning of the schedule_timeout in the timer_list structure,
> > and of course can print more information at the time of the wakeup.
> 
> And on the off-chance that the following messages from this weekend's
> runs are at all helpful.  One interesting point is that starting at
> time 4731.872311, there are repeated enqueues to CPU 5's timer wheel,
> but CPU 5's ->clk does not advance.  Things seem to advance at
> time 4891.721190.

Another layer on the onion...  For at least some of the failures, there
is a stalled CPU-hotplug operation.  This of course can mean that the
timers are stuck on the partially offlined CPU.  So I dumped the stack
of the task that is taking the CPU offline.  Please see below for console
output and patch.

I am considering doing an unsolicited wakeup of the task doing the
hotplug operation, but I am not convinced that the entirely of the CPU
hotplug code is willing to put up with that sort of thing.

							Thanx, Paul

------------------------------------------------------------------------

[ 1035.692679] schedule_timeout: Waylayed timer base->clk: 0x1000ae76c jiffies: 0x1000b39cd base->next_expiry: 0x1000ae76c timer->flags: 0xb000007 timer->expires 0x1000ae76b idx: 2c idx_now: dd base->pending_map 000010000000000002000800000000000002000000000000000000000020092000000000000000000000000080000000000000000000000000000000000000000000000000000000
[ 1035.692689] Torture onoff task state:
[ 1035.692691] torture_onoff   D14584   837      2 0x80000000
[ 1035.692734] Call Trace:
[ 1035.692742]  ? __schedule+0x33c/0x6f0
[ 1035.692745]  ? _raw_spin_lock_irq+0xf/0x30
[ 1035.692748]  schedule+0x37/0x90
[ 1035.692750]  schedule_timeout+0x20d/0x490
[ 1035.692755]  ? preempt_count_add+0x51/0x90
[ 1035.692758]  wait_for_completion+0x95/0x100
[ 1035.692763]  ? wake_up_q+0x60/0x60
[ 1035.692783]  cpuhp_kick_ap+0x29/0x70
[ 1035.692799]  cpuhp_kick_ap_work+0x34/0x130
[ 1035.692807]  _cpu_down+0x17e/0x1f0
[ 1035.692811]  do_cpu_down+0x30/0x50
[ 1035.692834]  torture_offline+0x76/0x140
[ 1035.692851]  torture_onoff+0xef/0x1c0
[ 1035.692854]  kthread+0xf0/0x130
[ 1035.692857]  ? torture_kthread_stopping+0x70/0x70

------------------------------------------------------------------------

commit 8069a4a1a0634055c43b50a37cb08237eea97dd5
Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
Date:   Tue Dec 5 13:59:06 2017 -0800

    EXP: timer: Dump torture-test online/offline task state
    
    Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>

diff --git a/include/linux/torture.h b/include/linux/torture.h
index 66272862070b..ac62d21c9688 100644
--- a/include/linux/torture.h
+++ b/include/linux/torture.h
@@ -102,4 +102,6 @@ void _torture_stop_kthread(char *m, struct task_struct **tp);
 #define torture_preempt_schedule()
 #endif
 
+void schedule_timeout_set_task2dump(struct task_struct *t);
+
 #endif /* __LINUX_TORTURE_H */
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index 23af27461d8c..0a67df4cdeb6 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1724,6 +1724,16 @@ static void process_timeout(struct timer_list *t)
 	wake_up_process(timeout->task);
 }
 
+static struct task_struct *schedule_timeout_task2dump;
+static DEFINE_MUTEX(schedule_timeout_task2dump_mutex);
+void schedule_timeout_set_task2dump(struct task_struct *t)
+{
+	mutex_lock(&schedule_timeout_task2dump_mutex);
+	WRITE_ONCE(schedule_timeout_task2dump, t);
+	mutex_unlock(&schedule_timeout_task2dump_mutex);
+}
+EXPORT_SYMBOL_GPL(schedule_timeout_set_task2dump);
+
 /**
  * schedule_timeout - sleep until timeout
  * @timeout: timeout value in jiffies
@@ -1808,6 +1818,14 @@ signed long __sched schedule_timeout(signed long timeout)
 		for (i = 0; i < WHEEL_SIZE / sizeof(base->pending_map[0]) / 8; i++)
 			pr_cont("%016lx", base->pending_map[i]);
 		pr_cont("\n");
+		if (READ_ONCE(schedule_timeout_task2dump)) {
+			mutex_lock(&schedule_timeout_task2dump_mutex);
+			if (schedule_timeout_task2dump) {
+				pr_info("Torture onoff task state:\n");
+				sched_show_task(schedule_timeout_task2dump);
+			}
+			mutex_unlock(&schedule_timeout_task2dump_mutex);
+		}
 	}
 	del_singleshot_timer_sync(&timer.timer);
 
diff --git a/kernel/torture.c b/kernel/torture.c
index 37b94012a3f8..48d5f03c644a 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -238,6 +238,7 @@ int torture_onoff_init(long ooholdoff, long oointerval)
 	if (onoff_interval <= 0)
 		return 0;
 	ret = torture_create_kthread(torture_onoff, NULL, onoff_task);
+	schedule_timeout_set_task2dump(onoff_task);
 #endif /* #ifdef CONFIG_HOTPLUG_CPU */
 	return ret;
 }
@@ -251,6 +252,7 @@ static void torture_onoff_cleanup(void)
 #ifdef CONFIG_HOTPLUG_CPU
 	if (onoff_task == NULL)
 		return;
+	schedule_timeout_set_task2dump(NULL);
 	VERBOSE_TOROUT_STRING("Stopping torture_onoff task");
 	kthread_stop(onoff_task);
 	onoff_task = NULL;