linux-kernel - [PATCH] perf: Fix inconsistency between IP and callchain sampling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100118054707.GT12666@kryten>
Date:	Mon, 18 Jan 2010 16:47:07 +1100
From:	Anton Blanchard <anton@...ba.org>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Paul Mackerras <paulus@...ba.org>, Ingo Molnar <mingo@...e.hu>
Cc:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Paul Mundt <lethal@...ux-sh.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	linux-kernel@...r.kernel.org
Subject: [PATCH] perf: Fix inconsistency between IP and callchain sampling


When running perf across all cpus with backtracing (-a -g), sometimes we
get samples without associated backtraces:

    23.44%         init  [kernel]                     [k] restore
    11.46%         init                       eeba0c  [k] 0x00000000eeba0c
     6.77%      swapper  [kernel]                     [k] .perf_ctx_adjust_freq
     5.73%         init  [kernel]                     [k] .__trace_hcall_entry
     4.69%         perf  libc-2.9.so                  [.] 0x0000000006bb8c
                       |          
                       |--11.11%-- 0xfffa941bbbc

It turns out the backtrace code has a check for the idle task and the IP
sampling does not. This creates problems when profiling an interrupt
heavy workload (in my case 10Gbit ethernet) since we get no backtraces
for interrupts received while idle (ie most of the workload).

Right now x86 and sh check that current is not NULL, which should never
happen so remove that too.

Signed-off-by: Anton Blanchard <anton@...ba.org>
---

The exclusion of idle tasks should be in the common perf events code,
perhaps keying off the exclude_idle field. It should also ensure that
we weren't in an interrupt at the time.

I also notice this:

        if (is_user && current->state != TASK_RUNNING)

But I'm not exactly sure what that will catch. When would we get a userspace
sample from something that isnt running?

Index: linux.trees.git/arch/powerpc/kernel/perf_callchain.c
===================================================================
--- linux.trees.git.orig/arch/powerpc/kernel/perf_callchain.c	2010-01-18 16:10:10.000000000 +1100
+++ linux.trees.git/arch/powerpc/kernel/perf_callchain.c	2010-01-18 16:10:17.000000000 +1100
@@ -495,9 +495,6 @@ struct perf_callchain_entry *perf_callch
 
 	entry->nr = 0;
 
-	if (current->pid == 0)		/* idle task? */
-		return entry;
-
 	if (!user_mode(regs)) {
 		perf_callchain_kernel(regs, entry);
 		if (current->mm)
Index: linux.trees.git/arch/x86/kernel/cpu/perf_event.c
===================================================================
--- linux.trees.git.orig/arch/x86/kernel/cpu/perf_event.c	2010-01-18 16:10:36.000000000 +1100
+++ linux.trees.git/arch/x86/kernel/cpu/perf_event.c	2010-01-18 16:17:33.000000000 +1100
@@ -2425,9 +2425,6 @@ perf_do_callchain(struct pt_regs *regs, 
 
 	is_user = user_mode(regs);
 
-	if (!current || current->pid == 0)
-		return;
-
 	if (is_user && current->state != TASK_RUNNING)
 		return;
 
Index: linux.trees.git/arch/sh/kernel/perf_callchain.c
===================================================================
--- linux.trees.git.orig/arch/sh/kernel/perf_callchain.c	2010-01-18 16:18:24.000000000 +1100
+++ linux.trees.git/arch/sh/kernel/perf_callchain.c	2010-01-18 16:18:37.000000000 +1100
@@ -68,9 +68,6 @@ perf_do_callchain(struct pt_regs *regs, 
 
 	is_user = user_mode(regs);
 
-	if (!current || current->pid == 0)
-		return;
-
 	if (is_user && current->state != TASK_RUNNING)
 		return;
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/