lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <9b0698774be3bb406e2b8b2c12dc1fb91532bff0.1559051152.git.bristot@redhat.com>
Date:   Tue, 28 May 2019 17:16:24 +0200
From:   Daniel Bristot de Oliveira <bristot@...hat.com>
To:     linux-kernel@...r.kernel.org
Cc:     williams@...hat.com, daniel@...stot.me,
        "Steven Rostedt (VMware)" <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Matthias Kaehlcke <mka@...omium.org>,
        "Joel Fernandes (Google)" <joel@...lfernandes.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Yangtao Li <tiny.windzz@...il.com>,
        Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>
Subject: [RFC 3/3] preempt_tracer: Use a percpu variable to control traceble calls

The preempt_disable tracepoint only traces in the disable <-> enable case.
Which is correct, but think about this case:

--------------------------- %< ----------------------
	 THREAD					IRQ
	   |					 |
preempt_disable_notrace() {
    __preempt_count_add(1)
}

/* preemption disabled/IRQs enabled */

	------->		smp_apic_timer_interrupt() {
				    preempt_disable() {
					do not trace (preempt count >= 1)
				    }
				    ....
				    preempt_enable() {
						do not trace (preempt count >= 1)
				    }
	<-------		}
preempt_enable_notrace() {
    __preempt_count_sub(1)
}
--------------------------- >% ----------------------

The non-traceble preempt_disable can hide a legit preempt_disable (which
is worth tracing).

It is possible to observe this problem using this kernel module:

    http://bristot.me/files/efficient_verification/wip.tar.gz

and doing the following trace:

	# cd /sys/kernel/debug/tracing/
	# echo 1 > snapshot
	# cat available_events | grep preempt_ > set_event
	# echo irq_vectors >> /sys/kernel/debug/tracing/set_event
	# insmod wip.ko
	/* wait for a snapshot creation */
	# tail -100 snapshot
	    sshd-1159  [000] d...1..  2440.866116: preempt_enable: caller=migrate_enable+0x1bb/0x330 parent=migrate_enable+0x1bb/0x330
	    sshd-1159  [000] d..h1..  2440.866122: local_timer_entry: vector=236
	    sshd-1159  [000] d..h1..  2440.866127: local_timer_exit: vector=236
	    sshd-1159  [000] d.L.4..  2440.866129: process_event: event sched_waking not expected in the state preemptive
	    sshd-1159  [000] d.L.4..  2440.866137: <stack trace>
	 => process_event
	 => __handle_event
	 => ttwu_do_wakeup
	 => try_to_wake_up
	 => irq_exit
	 => smp_apic_timer_interrupt
	 => apic_timer_interrupt
	 => kvm_clock_read
	 => ktime_get_with_offset
	 => posix_get_boottime
	 => __x64_sys_clock_gettime
	 => do_syscall_64
	 => entry_SYSCALL_64_after_hwframe

and kvm_clock_read() disables preemption without tracing:

--------------------------- %< ----------------------
static u64 kvm_clock_read(void)
{
	u64 ret;

	preempt_disable_notrace();
	ret = pvclock_clocksource_read(this_cpu_pvti());
	preempt_enable_notrace();
	return ret;
}
--------------------------- >% ----------------------

Use a percpu variable for the traced preempt_disable/enable, and use it
to decide whether trace or not.

Signed-off-by: Daniel Bristot de Oliveira <bristot@...hat.com>
Cc: "Steven Rostedt (VMware)" <rostedt@...dmis.org>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc: Matthias Kaehlcke <mka@...omium.org>
Cc: "Joel Fernandes (Google)" <joel@...lfernandes.org>
Cc: Frederic Weisbecker <frederic@...nel.org>
Cc: Yangtao Li <tiny.windzz@...il.com>
Cc: Tommaso Cucinotta <tommaso.cucinotta@...tannapisa.it>
Cc: linux-kernel@...r.kernel.org
---
 kernel/sched/core.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index be4117d7384f..2e07d4174778 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3148,19 +3148,25 @@ static inline void sched_tick_stop(int cpu) { }
 
 #if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
 				defined(CONFIG_TRACE_PREEMPT_TOGGLE))
+
+DEFINE_PER_CPU(int, __traced_preempt_count) = 0;
 /*
  * If the value passed in is equal to the current preempt count
  * then we just disabled preemption. Start timing the latency.
  */
 void preempt_latency_start(int val)
 {
-	if (preempt_count() == val) {
+	int curr = this_cpu_read(__traced_preempt_count);
+
+	if (!curr) {
 		unsigned long ip = get_lock_parent_ip();
 #ifdef CONFIG_DEBUG_PREEMPT
 		current->preempt_disable_ip = ip;
 #endif
 		trace_preempt_off(CALLER_ADDR0, ip);
 	}
+
+	this_cpu_write(__traced_preempt_count, curr + val);
 }
 
 static inline void preempt_add_start_latency(int val)
@@ -3200,8 +3206,12 @@ NOKPROBE_SYMBOL(preempt_count_add);
  */
 void preempt_latency_stop(int val)
 {
-	if (preempt_count() == val)
+	int curr = this_cpu_read(__traced_preempt_count) - val;
+
+	if (!curr)
 		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
+
+	this_cpu_write(__traced_preempt_count, curr);
 }
 
 static inline void preempt_sub_stop_latency(int val)
-- 
2.20.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ