lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Jan 2021 16:09:53 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Mark Rutland <mark.rutland@....com>
Cc:     linux-kernel@...r.kernel.org, maz@...nel.org, peterz@...radead.org,
        tglx@...utronix.de
Subject: Re: [PATCH 0/2] irq: detect slow IRQ handlers

On Tue, Jan 12, 2021 at 01:59:48PM +0000, Mark Rutland wrote:
> Hi,
> 
> While fuzzing arm64 with Syzkaller (under QEMU+KVM) over a number of releases,
> I've occasionally seen some ridiculously long stalls (20+ seconds), where it
> appears that a CPU is stuck in a hard IRQ context. As this gets detected after
> the CPU returns to the interrupted context, it's difficult to identify where
> exactly the stall is coming from.
> 
> These patches are intended to help tracking this down, with a WARN() if an IRQ
> handler takes longer than a given timout (1 second by default), logging the
> specific IRQ and handler function. While it's possible to achieve similar with
> tracing, it's harder to integrate that into an automated fuzzing setup.
> 
> I've been running this for a short while, and haven't yet seen any of the
> stalls with this applied, but I've tested with smaller timeout periods in the 1
> millisecond range by overloading the host, so I'm confident that the check
> works.
> 
> Thanks,
> Mark.

Nice!

Acked-by: Paul E. McKenney <paulmck@...nel.org>

I added the patch below to add a three-second delay to the scheduling
clock interrupt handler.  This executed, but did not cause your warning
to be emitted, probably because rcutorture runs under qemu/KVM.  So no
Tested-by, not yet, anyway.

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index e04e336..dac8c7a 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2606,6 +2606,8 @@ static void rcu_do_batch(struct rcu_data *rdp)
  */
 void rcu_sched_clock_irq(int user)
 {
+	static atomic_t invctr;
+
 	trace_rcu_utilization(TPS("Start scheduler-tick"));
 	lockdep_assert_irqs_disabled();
 	raw_cpu_inc(rcu_data.ticks_this_gp);
@@ -2623,6 +2625,14 @@ void rcu_sched_clock_irq(int user)
 		invoke_rcu_core();
 	lockdep_assert_irqs_disabled();
 
+	if (atomic_inc_return(&invctr) % 0x3ffff == 0) {
+		int i;
+
+		pr_alert("%s: 3-second delay.\n", __func__);
+		for (i = 0; i < 3000; i++)
+			udelay(1000);
+	}
+
 	trace_rcu_utilization(TPS("End scheduler-tick"));
 }
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ