linux-kernel - [patch] perf_counter: Add enable-on-exec attribute

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090629234601.GA5869@elte.hu>
Date:	Tue, 30 Jun 2009 01:46:01 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Vince Weaver <vince@...ter.net>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Paul Mackerras <paulus@...ba.org>,
	linux-kernel@...r.kernel.org, Mike Galbraith <efault@....de>
Subject: [patch] perf_counter: Add enable-on-exec attribute


* Vince Weaver <vince@...ter.net> wrote:

>> If the 5 thousand cycles measurement overhead _still_ matters to 
>> you under such circumstances then by all means please submit the 
>> patches to improve it. Despite your claims this is totally 
>> fixable with the current perfcounters design, Peter outlined the 
>> steps of how to solve it, you can utilize ptrace if you want to.
>
> Is it really "totally" fixible?  I don't just mean getting the 
> overhead from ~3000 down to ~100, I mean down to zero.

Yes, it's truly very easy to get exactly the same output as pfmon, 
for the 'million.s' test app you posted:

  titan:~> perf stat -e 0:1:u ./million

   Performance counter stats for './million':

          1000001  instructions            

      0.000489736  seconds time elapsed

See the small patch below.

( Note that this approach does not use ptrace, hence it can be used
  to measure debuggers too. ptrace attach has the limitation of
  being exclusive - no task can be attached to twice. perfmon used
  ptrace attach, which limited its capabilities unreasonably. )

The question was really not whether we can do it - but whether we 
want to do it. I have no strong feelings either way - because as i 
told you in my first mail, all the other noise sources in the system 
dominate the metrics far more than this very small constant startup 
offset.

And the thing is, as a perfmon contributor i assume you have 
experience in these matters. Had you taken a serious, unbiased look 
at perfcounters, and had this problem truly bothered you personally, 
you could have come up with a similar patch yourself as well, while 
only spending a fraction of the energies you are putting into these 
emails. Instead you ignored our technical arguments, you refused to 
touch the code and you went on rambling against how perfcounters 
supposedly cannot solve this problem. Not very productive IMO.

	Ingo

---------------->
Subject: perf_counter: Add enable-on-exec attribute
From: Ingo Molnar <mingo@...e.hu>
Date: Mon Jun 29 22:05:11 CEST 2009

Add another attribute variant: attr.enable_on_exec.

The purpose is to allow the auto-enabling of such counters
on exec(), to measure exec()-ed workloads precisely, from
the first to the last instruction.

Cc: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc: Mike Galbraith <efault@....de>
Cc: Paul Mackerras <paulus@...ba.org>
Cc: Arnaldo Carvalho de Melo <acme@...hat.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
 fs/exec.c                    |    3 +--
 include/linux/perf_counter.h |    5 ++++-
 kernel/perf_counter.c        |   39 ++++++++++++++++++++++++++++++++++++---
 tools/perf/builtin-stat.c    |    5 +++--
 4 files changed, 44 insertions(+), 8 deletions(-)

Index: linux/fs/exec.c
===================================================================
--- linux.orig/fs/exec.c
+++ linux/fs/exec.c
@@ -996,8 +996,7 @@ int flush_old_exec(struct linux_binprm *
 	 * Flush performance counters when crossing a
 	 * security domain:
 	 */
-	if (!get_dumpable(current->mm))
-		perf_counter_exit_task(current);
+	perf_counter_exec(current);
 
 	/* An exec changes our domain. We are no longer part of the thread
 	   group */
Index: linux/include/linux/perf_counter.h
===================================================================
--- linux.orig/include/linux/perf_counter.h
+++ linux/include/linux/perf_counter.h
@@ -179,8 +179,9 @@ struct perf_counter_attr {
 				comm	       :  1, /* include comm data     */
 				freq           :  1, /* use freq, not period  */
 				inherit_stat   :  1, /* per task counts       */
+				enable_on_exec :  1, /* enable on exec        */
 
-				__reserved_1   : 52;
+				__reserved_1   : 51;
 
 	__u32			wakeup_events;	/* wakeup every n events */
 	__u32			__reserved_2;
@@ -712,6 +713,7 @@ static inline void perf_counter_mmap(str
 
 extern void perf_counter_comm(struct task_struct *tsk);
 extern void perf_counter_fork(struct task_struct *tsk);
+extern void perf_counter_exec(struct task_struct *tsk);
 
 extern struct perf_callchain_entry *perf_callchain(struct pt_regs *regs);
 
@@ -752,6 +754,7 @@ perf_swcounter_event(u32 event, u64 nr, 
 static inline void perf_counter_mmap(struct vm_area_struct *vma)	{ }
 static inline void perf_counter_comm(struct task_struct *tsk)		{ }
 static inline void perf_counter_fork(struct task_struct *tsk)		{ }
+static inline void perf_counter_exec(struct task_struct *tsk)		{ }
 static inline void perf_counter_init(void)				{ }
 #endif
 
Index: linux/kernel/perf_counter.c
===================================================================
--- linux.orig/kernel/perf_counter.c
+++ linux/kernel/perf_counter.c
@@ -903,6 +903,9 @@ static void perf_counter_enable(struct p
 	struct perf_counter_context *ctx = counter->ctx;
 	struct task_struct *task = ctx->task;
 
+	if (counter->attr.enable_on_exec)
+		return;
+
 	if (!task) {
 		/*
 		 * Enable the counter on the cpu that it's on
@@ -2856,6 +2859,32 @@ void perf_counter_fork(struct task_struc
 	perf_counter_fork_event(&fork_event);
 }
 
+void perf_counter_exec(struct task_struct *task)
+{
+	struct perf_counter_context *ctx;
+	struct perf_counter *counter;
+
+	if (!get_dumpable(task->mm)) {
+		perf_counter_exit_task(task);
+		return;
+	}
+
+	if (!task->perf_counter_ctxp)
+		return;
+
+	rcu_read_lock();
+	ctx = task->perf_counter_ctxp;
+	if (ctx) {
+		list_for_each_entry(counter, &ctx->counter_list, list_entry) {
+			if (counter->attr.enable_on_exec) {
+				counter->attr.enable_on_exec = 0;
+				__perf_counter_enable(counter);
+			}
+		}
+	}
+	rcu_read_unlock();
+}
+
 /*
  * comm tracking
  */
@@ -4064,10 +4093,14 @@ inherit_counter(struct perf_counter *par
 	 * not its attr.disabled bit.  We hold the parent's mutex,
 	 * so we won't race with perf_counter_{en, dis}able_family.
 	 */
-	if (parent_counter->state >= PERF_COUNTER_STATE_INACTIVE)
-		child_counter->state = PERF_COUNTER_STATE_INACTIVE;
-	else
+	if (parent_counter->state >= PERF_COUNTER_STATE_INACTIVE) {
+		if (child_counter->attr.enable_on_exec)
+			child_counter->state = PERF_COUNTER_STATE_OFF;
+		else
+			child_counter->state = PERF_COUNTER_STATE_INACTIVE;
+	} else {
 		child_counter->state = PERF_COUNTER_STATE_OFF;
+	}
 
 	if (parent_counter->attr.freq)
 		child_counter->hw.sample_period = parent_counter->hw.sample_period;
Index: linux/tools/perf/builtin-stat.c
===================================================================
--- linux.orig/tools/perf/builtin-stat.c
+++ linux/tools/perf/builtin-stat.c
@@ -116,8 +116,9 @@ static void create_perf_stat_counter(int
 					fd[cpu][counter], strerror(errno));
 		}
 	} else {
-		attr->inherit	= inherit;
-		attr->disabled	= 1;
+		attr->inherit		= inherit;
+		attr->disabled		= 1;
+		attr->enable_on_exec	= 1;
 
 		fd[0][counter] = sys_perf_counter_open(attr, pid, -1, -1, 0);
 		if (fd[0][counter] < 0 && verbose)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/