lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20210602093326.15355-1-nsaenzju@redhat.com>
Date:   Wed,  2 Jun 2021 11:33:26 +0200
From:   Nicolas Saenz Julienne <nsaenzju@...hat.com>
To:     rostedt@...dmis.org, linux-kernel@...r.kernel.org
Cc:     mingo@...hat.com, mtosatti@...hat.com, nsaenzju@...hat.com,
        frederic@...nel.org
Subject: [PATCH] ring_buffer: Offload wakeup IPI to housekeeping CPUs

In order to minimize trace's effect on nohz_full CPUs, offload the
ring-buffer consumer wakeup IPI into one of the housekeeping CPUs.
Systems not using nohz_full will still run the wakeup IPI locally (as
per housekeeping_any_cpu()'s implementation).

This wakeup IPI, although negligible for the vast majority of trace
workloads, may cause unwarranted latencies on systems tracing events on
nohz_full CPUs. This is made worse on PREEMPT_RT kernels, as they defer
the irq_work handling into ksoftirqd, forcing unwarranted context
switches on the otherwise extremely busy CPU.

Note that the local IPI can't be avoided when tracing in NMI context, as
irq_work_queue() is the only mechanism supported in that context.

To illustrate this, tracing on a nohz_full CPU with PREEMPT_RT=y (plus a
good amount of extra isolation options). I see:
  - 50-100 μs latency spikes while tracing without this patch
  - 10-14 μs latency spikes while tracing with this patch
  - 8-11 μs latency spikes when not tracing at all

Signed-off-by: Nicolas Saenz Julienne <nsaenzju@...hat.com>
---
 kernel/trace/ring_buffer.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 2c0ee6484990..ce7817861c5e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5,6 +5,7 @@
  * Copyright (C) 2008 Steven Rostedt <srostedt@...hat.com>
  */
 #include <linux/trace_recursion.h>
+#include <linux/sched/isolation.h>
 #include <linux/trace_events.h>
 #include <linux/ring_buffer.h>
 #include <linux/trace_clock.h>
@@ -3052,6 +3053,20 @@ static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
 	rb_end_commit(cpu_buffer);
 }
 
+/*
+ * Offload the irq_work into a housekeeping CPU unless in NMI context, which
+ * doesn't support it. Systems not using nohz_full trigger the IPI locally.
+ *
+ * Note that irq_work supplies its own memory barriers.
+ */
+static __always_inline void rb_irq_work_queue(struct irq_work *work)
+{
+	if (in_nmi())
+		irq_work_queue(work);
+	else
+		irq_work_queue_on(work, housekeeping_any_cpu(HK_FLAG_MISC));
+}
+
 static __always_inline void
 rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
 {
@@ -3061,14 +3076,12 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
 
 	if (buffer->irq_work.waiters_pending) {
 		buffer->irq_work.waiters_pending = false;
-		/* irq_work_queue() supplies it's own memory barriers */
-		irq_work_queue(&buffer->irq_work.work);
+		rb_irq_work_queue(&buffer->irq_work.work);
 	}
 
 	if (cpu_buffer->irq_work.waiters_pending) {
 		cpu_buffer->irq_work.waiters_pending = false;
-		/* irq_work_queue() supplies it's own memory barriers */
-		irq_work_queue(&cpu_buffer->irq_work.work);
+		rb_irq_work_queue(&cpu_buffer->irq_work.work);
 	}
 
 	if (cpu_buffer->last_pages_touch == local_read(&cpu_buffer->pages_touched))
@@ -3090,8 +3103,7 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
 
 	cpu_buffer->irq_work.wakeup_full = true;
 	cpu_buffer->irq_work.full_waiters_pending = false;
-	/* irq_work_queue() supplies it's own memory barriers */
-	irq_work_queue(&cpu_buffer->irq_work.work);
+	rb_irq_work_queue(&cpu_buffer->irq_work.work);
 }
 
 #ifdef CONFIG_RING_BUFFER_RECORD_RECURSION
-- 
2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ