lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACV3sbKS2gJ4TytqZjx1LP-ZF0wYqS=QoaTFnq+X72EWWpucEg@mail.gmail.com>
Date:	Fri, 6 Jul 2012 10:26:01 +0800
From:	Jovi Zhang <bookjovi@...il.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Ingo Molnar <mingo@...hat.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	LKML <linux-kernel@...r.kernel.org>,
	Frederic Weisbecker <fweisbec@...hat.com>
Subject: Re: [PATCH] perf: add /proc/perf_events file for dump perf events info

Hi peter,

Very thanks for your comments!

>> sorry, I will try to explain more.
>>
>> One problem I faced is about hw_breakpoint.
>> As you known, hw_breakpoint use limit debug register in most architecture,
>> In multi-user environment, sometime user cannot set hw_breakpoint because
>> other user already occupy hw_breakpoint slots. currently, there don't
>> have a way to know
>> how many hw_breakpint perf event already is used in system, so that's
>> why I thinking
>> we might need a way to get perf event in systerm wide, with visable output.
>
> Hrmm,. so this seems pretty specific to the horror of hw_breakpoint. And
> yes those are unfortunate and weird.
>
> But how would you use this proc file? Would you go read it
> programmatically or just look at it as a user to figure out why stuff
> doesn't work?

Currently I using eye to figure out what happening in system using
this /proc/perf_events,
but this file should be parsed programmatically when if need.

>
>> Also this method is not only used for hw_breakpoint, others perf event
>> might have similar problem,
>> even other perf event don't have limit number, but it can make use of
>> this /proc/perf_events
>
> They have a limit alright, but we can round-robin them to hide this fact
> (unless you tell it not to).
>
>> Active perf events is cpu consumer at most time, at this point of
>> view, system administrator also can use this
>> /proc/perf_events to detect is there have any perf events is consuming cpu.
>
> I doubt you can see which is consuming cycles, but you can see if
> there's any in use.
>
>> A method to detect perf event leak? of couse our perf subsystem is
>> very stable right now, ingnore this :)
>
> There's alway bugs ;-)
>
> The problem I have with the patch is the global nature of it.. but if
> something like this is require I guess I can live with it. But it might
> be the current proposal is exposing too much information, I would
> certainly not mark it readable for the entire world either.
>
In this patch version 1, I outputed some field info which might not
need very heavily,
so I removed oncpu(overlaped with cpu field) and attr flag in next
version patch, now it
seems more clearly than before, only output key point field info of perf event.

How about below version 2 patch(attached again)?


>From 8fd37b246dcd4f50cb32e5250db5a0aaccc398cc Mon Sep 17 00:00:00 2001
From: Jovi Zhang <bookjovi@...il.com>
Date: Fri, 6 Jul 2012 18:01:03 +0800
Subject: [PATCH] perf: add /proc/perf_events file for dump perf events info

kernel should provide some information to user to get know how many
perf event is in use, especially some perf events already occupied
limited resource(like hw_breakpoint).

This patch add a /proc/perf_events file to dump perf events info
in system wide, with some key field of perf_event, include pmu name,
state, attach_state, cpu, count, id, and some field of attr.

See demo:

[root@...i perf]# cat /proc/kallsyms |grep linux_proc_banner
c09b7020 R linux_proc_banner
[root@...i perf]# ./perf record -e mem:0xc09b7020 -g -a -d
...
[root@...i proc]# cat /proc/version
...

[root@...i proc]# cat /proc/perf_events
1:
pmu:                        breakpoint
state:                        ACTIVE
attach_state:             ATTACH_CONTEXT ATTACH_GROUP
cpu:                          0
count:                       0
id:                            13
attr.type:                   BREAKPOINT
attr.config:                 0
attr.sample_type:       IP TID TIME ADDR CALLCHAIN CPU PERIOD
attr.bp_type:              RW
attr.bp_addr:              0xc09b7020
attr.bp_len:                4

2:
pmu:                         breakpoint
state:                        ACTIVE
attach_state:             ATTACH_CONTEXT ATTACH_GROUP
cpu:                          1
count:                        0
id:                             14
attr.type:                    BREAKPOINT
attr.config:                  0
attr.sample_type:        IP TID TIME ADDR CALLCHAIN CPU PERIOD
attr.bp_type:               RW
attr.bp_addr:               0xc09b7020
attr.bp_len:                 4

3:
...

Signed-off-by: Jovi Zhang <bookjovi@...il.com>
---
 include/linux/perf_event.h       |    1 +
 kernel/events/Makefile           |    1 +
 kernel/events/core.c             |   14 +++
 kernel/events/proc_perf_events.c |  188 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 204 insertions(+)
 create mode 100644 kernel/events/proc_perf_events.c

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 45db49f..67d9e7d 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -871,6 +871,7 @@ struct perf_event {
 	struct list_head		group_entry;
 	struct list_head		event_entry;
 	struct list_head		sibling_list;
+	struct list_head		perf_entry;
 	struct hlist_node		hlist_entry;
 	int				nr_siblings;
 	int				group_flags;
diff --git a/kernel/events/Makefile b/kernel/events/Makefile
index 103f5d1..8b34070 100644
--- a/kernel/events/Makefile
+++ b/kernel/events/Makefile
@@ -6,4 +6,5 @@ obj-y := core.o ring_buffer.o callchain.o

 obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
 obj-$(CONFIG_UPROBES) += uprobes.o
+obj-$(CONFIG_PROC_FS) += proc_perf_events.o

diff --git a/kernel/events/core.c b/kernel/events/core.c
index d7d71d6..55766d0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -147,6 +147,10 @@ static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
 static struct srcu_struct pmus_srcu;

+LIST_HEAD(perf_events_list);
+DEFINE_MUTEX(perf_events_lock);
+
+
 /*
  * perf event paranoia level:
  *  -1 - not paranoid at all
@@ -2897,6 +2901,10 @@ static void free_event(struct perf_event *event)
 	if (event->ctx)
 		put_ctx(event->ctx);

+	mutex_lock(&perf_events_lock);
+	list_del_rcu(&event->perf_entry);
+	mutex_unlock(&perf_events_lock);
+
 	call_rcu(&event->rcu_head, free_event_rcu);
 }

@@ -5916,6 +5924,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	INIT_LIST_HEAD(&event->event_entry);
 	INIT_LIST_HEAD(&event->sibling_list);
 	INIT_LIST_HEAD(&event->rb_entry);
+	INIT_LIST_HEAD(&event->perf_entry);

 	init_waitqueue_head(&event->waitq);
 	init_irq_work(&event->pending, perf_pending_event);
@@ -6013,6 +6022,10 @@ done:
 		}
 	}

+	mutex_lock(&perf_events_lock);
+	list_add_tail_rcu(&event->perf_entry, &perf_events_list);
+	mutex_unlock(&perf_events_lock);
+
 	return event;
 }

@@ -7220,3 +7233,4 @@ struct cgroup_subsys perf_subsys = {
 	.attach		= perf_cgroup_attach,
 };
 #endif /* CONFIG_CGROUP_PERF */
+
diff --git a/kernel/events/proc_perf_events.c b/kernel/events/proc_perf_events.c
new file mode 100644
index 0000000..7079701
--- /dev/null
+++ b/kernel/events/proc_perf_events.c
@@ -0,0 +1,188 @@
+/*
+ *	linux/kerenl/events/proc_perf_events.c
+ *
+ *	Dump information for all perf_event
+ *
+ *	Created by: Jovi Zhang (bookjovi@...il.com)
+ *
+ */
+
+#include <linux/perf_event.h>
+#include <linux/hw_breakpoint.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+
+extern struct list_head perf_events_list;
+extern struct mutex perf_events_lock;
+
+static const char *perf_state_name(enum perf_event_active_state state)
+{
+	const char *name;
+
+	switch (state) {
+	case PERF_EVENT_STATE_ERROR:
+		name = "ERROR";
+		break;
+	case PERF_EVENT_STATE_OFF:
+		name = "OFF";
+		break;
+	case PERF_EVENT_STATE_INACTIVE:
+		name = "INACTIVE";
+		break;
+	case PERF_EVENT_STATE_ACTIVE:
+		name = "ACTIVE";
+		break;
+	default:
+		name = "NULL";
+	}
+
+	return name;
+}
+
+static void perf_attach_state_show(struct seq_file *m,
+				   unsigned int attach_state)
+{
+	seq_printf(m, "attach_state:\t\t");
+
+	if (attach_state & PERF_ATTACH_CONTEXT)
+		seq_printf(m, "ATTACH_CONTEXT ");
+	if (attach_state & PERF_ATTACH_GROUP)
+		seq_printf(m, "ATTACH_GROUP ");
+	if (attach_state & PERF_ATTACH_TASK)
+		seq_printf(m, " ATTACH_TASK ");
+
+	seq_putc(m, '\n');
+}
+
+static void perf_attr_sample_type_show(struct seq_file *m, __u64 sample_type)
+{
+	int i, valid = 0;
+
+	static char *sample_type_name[] = {
+		"IP",
+		"TID",
+		"TIME",
+		"ADDR",
+		"READ",
+		"CALLCHAIN",
+		"ID",
+		"CPU",
+		"PERIOD",
+		"STREAM_ID",
+		"RAW",
+		"BRANCH_STACK"
+	};
+
+	seq_printf(m, "attr.sample_type:\t");
+
+	for (i = 0; i < ARRAY_SIZE(sample_type_name); i++) {
+		if (sample_type & (1UL << i)) {
+			seq_printf(m, "%s ", sample_type_name[i]);
+			valid = 1;
+		}
+	}
+
+	if (!valid)
+		seq_printf(m, "NULL");
+
+	seq_putc(m, '\n');
+}
+
+static void perf_event_bp_show(struct seq_file *m,
+			       __u32 bp_type, __u32 bp_addr, __u32 bp_len)
+{
+	char *name;
+
+	seq_printf(m, "attr.bp_type:\t\t");
+	switch (bp_type) {
+	case HW_BREAKPOINT_EMPTY:
+		name = "EMPTY";
+		break;
+	case HW_BREAKPOINT_R:
+		name = "R";
+		break;
+	case HW_BREAKPOINT_W:
+		name = "W";
+		break;
+	case HW_BREAKPOINT_RW:
+		name = "RW";
+		break;
+	case HW_BREAKPOINT_X:
+		name = "X";
+		break;
+	case HW_BREAKPOINT_INVALID:
+		name = "INVALID";
+		break;
+	default:
+		name = "NULL";
+	}
+	seq_printf(m, "%s\n", name);
+
+	seq_printf(m, "attr.bp_addr:\t\t0x%x\n", bp_addr);
+	seq_printf(m, "attr.bp_len:\t\t%d\n", bp_len);
+}
+
+static void perf_event_attr_show(struct seq_file *m,
+				 struct perf_event_attr *attr)
+{
+	static const char * const type_name[] = {
+		"HARDWARE",
+		"SOFTWARE",
+		"TRACEPOINT",
+		"HW_CACHE",
+		"RAW",
+		"BREAKPOINT"
+	};
+
+	seq_printf(m, "attr.type:\t\t%s\n", type_name[attr->type]);
+	seq_printf(m, "attr.config:\t\t%llu\n", attr->config);
+	perf_attr_sample_type_show(m, attr->sample_type);
+	perf_event_bp_show(m, attr->bp_type, attr->bp_addr, attr->bp_len);
+}
+
+static int perf_events_proc_show(struct seq_file *m, void *v)
+{
+	struct perf_event *event;
+	int i = 0;
+
+	mutex_lock(&perf_events_lock);
+	list_for_each_entry(event, &perf_events_list, perf_entry) {
+		i++;
+		seq_printf(m, "%d:\n", i);
+		seq_printf(m, "pmu:\t\t\t%s\n",
+				event->pmu ? event->pmu->name : "NULL");
+		seq_printf(m, "state:\t\t\t%s\n",
+				perf_state_name(event->state));
+		perf_attach_state_show(m, event->attach_state);
+		seq_printf(m, "cpu:\t\t\t%d\n", event->cpu);
+		seq_printf(m, "count:\t\t\t%llu\n",
+				local64_read(&event->count));
+		seq_printf(m, "id:\t\t\t%llu\n", event->id);
+		perf_event_attr_show(m, &event->attr);
+
+		seq_putc(m, '\n');
+	}
+	mutex_unlock(&perf_events_lock);
+
+	return 0;
+}
+
+static int perf_events_proc_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, perf_events_proc_show, NULL);
+}
+
+static const struct file_operations perf_events_proc_fops = {
+	.open           = perf_events_proc_open,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.release        = single_release,
+};
+
+static int __init proc_perf_events_init(void)
+{
+	proc_create("perf_events", 0444, NULL, &perf_events_proc_fops);
+	return 0;
+}
+
+device_initcall(proc_perf_events_init);
-- 
1.7.9.7

Download attachment "0001-perf-add-proc-perf_events-file-for-dump-perf-events-.patch" of type "application/octet-stream" (8450 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ