[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170530172555.5ya3ilfw3sowokjz@hirez.programming.kicks-ass.net>
Date: Tue, 30 May 2017 19:25:56 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Vince Weaver <vincent.weaver@...ne.edu>
Cc: Andi Kleen <ak@...ux.intel.com>,
Stephane Eranian <eranian@...gle.com>,
"Liang, Kan" <kan.liang@...el.com>,
"mingo@...hat.com" <mingo@...hat.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"alexander.shishkin@...ux.intel.com"
<alexander.shishkin@...ux.intel.com>,
"acme@...hat.com" <acme@...hat.com>,
"jolsa@...hat.com" <jolsa@...hat.com>,
"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
"tglx@...utronix.de" <tglx@...utronix.de>
Subject: Re: [PATCH 1/2] perf/x86/intel: enable CPU ref_cycles for GP counter
On Wed, May 24, 2017 at 12:01:50PM -0400, Vince Weaver wrote:
> I already have people really grumpy that you have to have one mmap() page
> per event, meaning you sacrifice one TLB entry for each event you are
> measuring.
So there is space in that page. We could maybe look at having an array
of stuff (max 32 entries?) which would cover a whole event group.
Then you only need to mmap() the page for the leading event.
Looking at the layout it would be slightly awkward to do (or we need to
ref the layout, which will undoubtedly be painful too). But for groups
the time fields at least are all shared.
At the very least we need index and offset, ideally pmc_width would be
the same for all counters (can we assume that?).
Something like the below for the uapi changes I suppose. I've not
tried to actually implement it yet, but would something like that be
usable?
---
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b1c0b187acfe..40ff77e52b9d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch : 1, /* context switch data */
write_backward : 1, /* Write ring buffer from end to beginning */
namespaces : 1, /* include namespaces data */
- __reserved_1 : 35;
+ group_pmc : 1, /* use group_pmc in perf_event_mmap_page */
+ __reserved_1 : 34;
union {
__u32 wakeup_events; /* wakeup every n events */
@@ -469,7 +470,8 @@ struct perf_event_mmap_page {
cap_user_rdpmc : 1, /* The RDPMC instruction can be used to read counts */
cap_user_time : 1, /* The time_* fields are used */
cap_user_time_zero : 1, /* The time_zero field is used */
- cap_____res : 59;
+ cap_group_pmc : 1, /* The group_pmc field is used, ignore @index and @offset */
+ cap_____res : 58;
};
};
@@ -530,11 +532,29 @@ struct perf_event_mmap_page {
__u64 time_zero;
__u32 size; /* Header size up to __reserved[] fields. */
+ __u32 __reserved_4byte_hole;
+
+ /*
+ * If cap_group_pmc this array replaces @index and @offset. The array
+ * will contain an entry for each group member lead by the event belonging
+ * to this mmap().
+ *
+ * The @id field can be used to identify which sibling event the respective
+ * @index and @offset values belong to. Assuming an immutable group, the
+ * array index will stay constant for each event.
+ */
+ struct {
+ __u32 index;
+ __u32 __reserved_hole;
+ __s64 offset;
+ __u64 id; /* event id */
+ } group_pmc[32];
+
/*
* Hole for extension of the self monitor capabilities
*/
- __u8 __reserved[118*8+4]; /* align to 1k. */
+ __u8 __reserved[22*8]; /* align to 1k. */
/*
* Control data for the mmap() data buffer.
Powered by blists - more mailing lists