[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJvTdK=qKzLSKYNjZ+0Ay7F5CzxKov3db3p2KNk59O1iNK+bLw@mail.gmail.com>
Date: Tue, 24 Mar 2015 17:20:13 -0400
From: Len Brown <lenb@...nel.org>
To: Andi Kleen <andi@...stfloor.org>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Andi Kleen <ak@...ux.intel.com>,
Linux PM list <linux-pm@...r.kernel.org>
Subject: Re: [PATCH 1/2] Support PCU power metrics in turbostat
<cc: linux-pm list>
On Thu, Nov 13, 2014 at 6:19 PM, Andi Kleen <andi@...stfloor.org> wrote:
> From: Andi Kleen <ak@...ux.intel.com>
>
> Add support for reading PCU power metrics on Sandy Bridge / Ivy Bridge EP
> and Haswell Server in turbostat. This is done using the perf ABI,
> using the perf uncore driver. This requires the kernel to
> have uncore perf driver support.
What happens if kernel doesn't include that support?
> The PCU has a large number of events, but only allows to monitor
> four of them at the same time. We always need the PCU cycles
> event, so this leaves three events per event group.
>
> The user has to specify the event group using a new -x option. All
> more sensible option characters were already taken. When -x is
> not specified no behavior changes.
I'm concerned that turbostat cmdline is getting too complicated,
and this makes that more the case.
would need EXAMPLES in turbostat.8 to really be useful.
> perf in principle supports time based multi-plexing, which
> allows monitoring multiple group at the same time, but support for
> that is not implemented in the tool so far. It would
> also require enabling an potentially idle-disturbing timer.
> So right now we don't multiplex.
>
> I modeled the event groups after the proven ones in
> pcm-power in the PCM (Intel Performance Counter Monitor) tool.
> That is where the numbers come from.
>
> However unlike PCM this uses the perf interfaces, instead
> of directly accessing the hardware.
>
> The current groups are:
>
> 0: This should 3 monitor frequency bands. It could give more
> accurate information than the average frequency from turbostat,
> as it can keep multiple buckets.
>
> However this currently runs into a problem with the uncore
> driver that only makes us able to monitor a single band.
> Disabled until this is fixed.
>
> 1: C-state residencies. Already covered in turbostat and not
> implemented.
>
> 2/3/4: Various reasons for frequency limits.
>
> 5: Number of frequency transitions and time of PCU transition duration.
> (note this is not the full time of the transition)
>
> Other power metrics could be added later using the same frame work.
> For example it would be possible to implement the memory power saving
> metrics from pcm-power, which would output power state statistics per
> channel. This would definitely need a new output format,
> as it won't fit into any terminal with the current one.
More columns are no longer an issue.
The latest turbostat has two modes -- default is just topology & frequency.
The --debug option adds all metrics, and output is generally
re-directed to a file.
>
> Custom user metrics would be also possible.
>
> The event resolution code is derived from the jevents library
> (parts of pmu-tools, http://github.com/andikleen/pmu-tools)
> and is BSD licensed.
can we put BSD licensed code into utilities that are in the linux
kernel git tree?
I sort of like a single source file, but if the code really is
unrelated, I guess 2nd file is okay.
thanks,
-Len
> Signed-off-by: Andi Kleen <ak@...ux.intel.com>
> ---
> tools/power/x86/turbostat/Makefile | 11 +-
> tools/power/x86/turbostat/resolve.c | 233 ++++++++++++++++++++++++++++++++++
> tools/power/x86/turbostat/resolve.h | 2 +
> tools/power/x86/turbostat/turbostat.8 | 5 +
> tools/power/x86/turbostat/turbostat.c | 206 +++++++++++++++++++++++++++++-
> 5 files changed, 449 insertions(+), 8 deletions(-)
> create mode 100644 tools/power/x86/turbostat/resolve.c
> create mode 100644 tools/power/x86/turbostat/resolve.h
>
> diff --git a/tools/power/x86/turbostat/Makefile b/tools/power/x86/turbostat/Makefile
> index d1b3a36..745af06 100644
> --- a/tools/power/x86/turbostat/Makefile
> +++ b/tools/power/x86/turbostat/Makefile
> @@ -3,17 +3,20 @@ BUILD_OUTPUT := $(PWD)
> PREFIX := /usr
> DESTDIR :=
>
> -turbostat : turbostat.c
> +turbostat : turbostat.o resolve.o
> + @mkdir -p $(BUILD_OUTPUT)
> + $(CC) $(LDFLAGS) $^ -o $(BUILD_OUTPUT)/$@
> +
> CFLAGS += -Wall
> CFLAGS += -DMSRHEADER='"../../../../arch/x86/include/uapi/asm/msr-index.h"'
>
> -%: %.c
> +%.o: %.c
> @mkdir -p $(BUILD_OUTPUT)
> - $(CC) $(CFLAGS) $< -o $(BUILD_OUTPUT)/$@
> + $(CC) $(CFLAGS) $< -c -o $(BUILD_OUTPUT)/$@
>
> .PHONY : clean
> clean :
> - @rm -f $(BUILD_OUTPUT)/turbostat
> + @rm -f $(BUILD_OUTPUT)/turbostat turbostat.o resolve.o
>
> install : turbostat
> install -d $(DESTDIR)$(PREFIX)/bin
> diff --git a/tools/power/x86/turbostat/resolve.c b/tools/power/x86/turbostat/resolve.c
> new file mode 100644
> index 0000000..ad53159
> --- /dev/null
> +++ b/tools/power/x86/turbostat/resolve.c
> @@ -0,0 +1,233 @@
> +/* Resolve perf style event descriptions to attr */
> +/*
> + * Copyright (c) 2014, Intel Corporation
> + * Author: Andi Kleen
> + * All rights reserved.
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are met:
> + *
> + * 1. Redistributions of source code must retain the above copyright notice,
> + * this list of conditions and the following disclaimer.
> + *
> + * 2. Redistributions in binary form must reproduce the above copyright
> + * notice, this list of conditions and the following disclaimer in the
> + * documentation and/or other materials provided with the distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
> + * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
> + * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> + * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> + * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> + * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
> + * OF THE POSSIBILITY OF SUCH DAMAGE.
> +*/
> +
> +#define _GNU_SOURCE 1
> +#include "resolve.h"
> +#include <linux/perf_event.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <stdarg.h>
> +#include <stdlib.h>
> +#include <stdbool.h>
> +#include <unistd.h>
> +#include <sys/fcntl.h>
> +#include <err.h>
> +
> +#define MAXFILE 4096
> +
> +static int read_file(char **val, const char *fmt, ...)
> +{
> + char *fn;
> + va_list ap;
> + int fd;
> + int ret = -1;
> + int len;
> +
> + *val = malloc(MAXFILE);
> + if (!*val)
> + err(1, "out of memory");
> + va_start(ap, fmt);
> + vasprintf(&fn, fmt, ap);
> + va_end(ap);
> + fd = open(fn, O_RDONLY);
> + free(fn);
> + if (fd >= 0) {
> + if ((len = read(fd, *val, MAXFILE - 1)) > 0) {
> + ret = 0;
> + (*val)[len] = 0;
> + }
> + close(fd);
> + }
> + if (ret < 0) {
> + free(*val);
> + *val = NULL;
> + }
> + return ret;
> +}
> +
> +#define BITS(x) ((1U << (x)) - 1)
> +
> +static bool try_parse(char *format, char *fmt, __u64 val, __u64 *config)
> +{
> + int start, end;
> + int n = sscanf(format, fmt, &start, &end);
> + if (n == 1)
> + end = start + 1;
> + if (n == 0)
> + return false;
> + *config |= (val & BITS(end - start + 1)) << start;
> + return true;
> +}
> +
> +static int read_qual(char *qual, struct perf_event_attr *attr)
> +{
> + while (*qual) {
> + switch (*qual) {
> + case 'p':
> + attr->precise_ip++;
> + break;
> + case 'k':
> + attr->exclude_user = 1;
> + break;
> + case 'u':
> + attr->exclude_kernel = 1;
> + break;
> + case 'h':
> + attr->exclude_guest = 1;
> + break;
> + /* XXX more */
> + default:
> + fprintf(stderr, "Unknown modifier %c at end\n", *qual);
> + return -1;
> + }
> + qual++;
> + }
> + return 0;
> +}
> +
> +static bool special_attr(char *name, int val, struct perf_event_attr *attr)
> +{
> + if (!strcmp(name, "period")) {
> + attr->sample_period = val;
> + return true;
> + }
> + if (!strcmp(name, "freq")) {
> + attr->sample_freq = val;
> + attr->freq = 1;
> + return true;
> + }
> + return false;
> +}
> +
> +static int parse_terms(char *pmu, char *config, struct perf_event_attr *attr, int recur)
> +{
> + char *format = NULL;
> + char *term;
> +
> + char *newl = strchr(config, '\n');
> + if (newl)
> + *newl = 0;
> +
> + while ((term = strsep(&config, ",")) != NULL) {
> + char name[30];
> + int n, val = 1;
> +
> + n = sscanf(term, "%30[^=]=%i", name, &val);
> + if (n < 1)
> + break;
> + if (special_attr(name, val, attr))
> + continue;
> + free(format);
> + if (read_file(&format, "/sys/devices/%s/format/%s", pmu, name) < 0) {
> + char *alias = NULL;
> +
> + if (recur == 0 &&
> + read_file(&alias, "/sys/devices/%s/events/%s", pmu, name) == 0) {
> + if (parse_terms(pmu, alias, attr, 1) < 0) {
> + free(alias);
> + fprintf(stderr, "Cannot parse kernel event alias %s\n", name);
> + break;
> + }
> + free(alias);
> + continue;
> + }
> + fprintf(stderr, "Cannot parse qualifier %s\n", name);
> + break;
> + }
> + bool ok = try_parse(format, "config:%d-%d", val, &attr->config) ||
> + try_parse(format, "config:%d", val, &attr->config) ||
> + try_parse(format, "config1:%d-%d", val, &attr->config1) ||
> + try_parse(format, "config1:%d", val, &attr->config1) ||
> + try_parse(format, "config2:%d-%d", val, &attr->config2) ||
> + try_parse(format, "config2:%d", val, &attr->config2);
> + if (!ok) {
> + fprintf(stderr, "Cannot parse kernel format %s: %s\n",
> + name, format);
> + break;
> + }
> + }
> + free(format);
> + if (term)
> + return -1;
> + return 0;
> +}
> +
> +
> +/* Resolve perf new style event descriptor to perf ATTR. User must initialize
> + * attr->sample_type and attr->read_format as needed after this call,
> + * and possibly other fields.
> + */
> +int tjevent_name_to_attr(char *str, struct perf_event_attr *attr)
> +{
> + char pmu[30], config[200];
> + int qual_off;
> +
> + memset(attr, 0, sizeof(struct perf_event_attr));
> + attr->size = PERF_ATTR_SIZE_VER1;
> +
> + if (sscanf(str, "%30[^/]/%200[^/]/%n", pmu, config, &qual_off) < 2)
> + return -1;
> + char *type = NULL;
> + if (read_file(&type, "/sys/devices/%s/type", pmu) < 0)
> + return -1;
> + attr->type = atoi(type);
> + free(type);
> + if (parse_terms(pmu, config, attr, 0) < 0)
> + return -1;
> + if (read_qual(str + qual_off, attr) < 0)
> + return -1;
> + return 0;
> +}
> +
> +#ifdef TEST
> +#include <asm/unistd.h>
> +int main(int ac, char **av)
> +{
> + struct perf_event_attr attr = { 0 };
> + int ret = 1;
> +
> + if (!av[1]) {
> + printf("Usage: ... perf-event-to-parse\n");
> + exit(1);
> + }
> + while (*++av) {
> + if (jevent_name_to_attr(*av, &attr) < 0)
> + printf("cannot parse %s\n", *av);
> + printf("config %llx config1 %llx\n", attr.config, attr.config1);
> + int fd;
> + if ((fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0)) < 0)
> + perror("perf_event_open");
> + else
> + ret = 0;
> + close(fd);
> + }
> + return ret;
> +}
> +#endif
> diff --git a/tools/power/x86/turbostat/resolve.h b/tools/power/x86/turbostat/resolve.h
> new file mode 100644
> index 0000000..286e798
> --- /dev/null
> +++ b/tools/power/x86/turbostat/resolve.h
> @@ -0,0 +1,2 @@
> +struct perf_event_attr;
> +int tjevent_name_to_attr(char *str, struct perf_event_attr *attr);
> diff --git a/tools/power/x86/turbostat/turbostat.8 b/tools/power/x86/turbostat/turbostat.8
> index 56bfb52..e91d837 100644
> --- a/tools/power/x86/turbostat/turbostat.8
> +++ b/tools/power/x86/turbostat/turbostat.8
> @@ -42,6 +42,11 @@ The \fB-M MSR#\fP option includes the the specified 64-bit MSR value.
> The \fB-i interval_sec\fP option prints statistics every \fiinterval_sec\fP seconds.
> The default is 5 seconds.
> .PP
> +The \fB-x GROUP\fP option enables PCU event group monitoring. Valid groups are
> +2 for thermal frequency limits, 3 and 4 for other frequency limits,
> +5 for frequency transitions statistics. Requires the kernel
> +to support the perf uncore driver for this platform.
> +.PP
> The \fBcommand\fP parameter forks \fBcommand\fP and upon its exit,
> displays the statistics gathered since it was forked.
> .PP
> diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
> index 5b1b807..5c23f00 100644
> --- a/tools/power/x86/turbostat/turbostat.c
> +++ b/tools/power/x86/turbostat/turbostat.c
> @@ -38,6 +38,9 @@
> #include <ctype.h>
> #include <sched.h>
> #include <cpuid.h>
> +#include <sys/syscall.h>
> +#include <linux/perf_event.h>
> +#include "resolve.h"
>
> char *proc_stat = "/proc/stat";
> unsigned int interval_sec = 5; /* set with -i interval_sec */
> @@ -81,6 +84,11 @@ unsigned int tcc_activation_temp;
> unsigned int tcc_activation_temp_override;
> double rapl_power_units, rapl_energy_units, rapl_time_units;
> double rapl_joule_counter_range;
> +int do_pcu_group = -1;
> +int is_jkt;
> +int is_ivt;
> +int is_hsx;
> +int max_package_id;
>
> #define RAPL_PKG (1 << 0)
> /* 0x610 MSR_PKG_POWER_LIMIT */
> @@ -159,7 +167,7 @@ struct pkg_data {
> unsigned int rapl_pkg_perf_status; /* MSR_PKG_PERF_STATUS */
> unsigned int rapl_dram_perf_status; /* MSR_DRAM_PERF_STATUS */
> unsigned int pkg_temp_c;
> -
> + unsigned long long pcu0, pcu1, pcu2, pcu3;
> } *package_even, *package_odd;
>
> #define ODD_COUNTERS thread_odd, core_odd, package_odd
> @@ -264,6 +272,8 @@ int get_msr(int cpu, off_t offset, unsigned long long *msr)
> return 0;
> }
>
> +static char *pcu_group_titles[][3];
> +
> /*
> * Example Format w/ field column widths:
> *
> @@ -353,6 +363,12 @@ void print_header(void)
> outp += sprintf(outp, " time");
>
> }
> + if (do_pcu_group >= 0) {
> + outp += sprintf(outp, " %-12s", pcu_group_titles[do_pcu_group][0]);
> + outp += sprintf(outp, " %-12s", pcu_group_titles[do_pcu_group][1]);
> + if (pcu_group_titles[do_pcu_group][2][0])
> + outp += sprintf(outp, " %-8s", pcu_group_titles[do_pcu_group][2]);
> + }
> outp += sprintf(outp, "\n");
> }
>
> @@ -406,6 +422,11 @@ int dump_counters(struct thread_data *t, struct core_data *c,
> outp += sprintf(outp, "Throttle RAM: %0X\n",
> p->rapl_dram_perf_status);
> outp += sprintf(outp, "PTM: %dC\n", p->pkg_temp_c);
> +
> + outp += sprintf(outp, "PCU0: %0llX\n", p->pcu0);
> + outp += sprintf(outp, "PCU1: %0llX\n", p->pcu1);
> + outp += sprintf(outp, "PCU2: %0llX\n", p->pcu2);
> + outp += sprintf(outp, "PCU3: %0llX\n", p->pcu3);
> }
>
> outp += sprintf(outp, "\n");
> @@ -413,6 +434,13 @@ int dump_counters(struct thread_data *t, struct core_data *c,
> return 0;
> }
>
> +static int add_percent(char *outp, unsigned long long val, double cycles)
> +{
> + if (val == 0)
> + return sprintf(outp, "%12s", "");
> + return sprintf(outp, " %12.2f", 100.0 * (val / cycles));
> +}
> +
> /*
> * column formatting convention & formats
> */
> @@ -581,6 +609,17 @@ int format_counters(struct thread_data *t, struct core_data *c,
> outp += sprintf(outp, fmt8, interval_float);
>
> }
> +
> + if (do_pcu_group >= 0) {
> + double pcu_cycles = p->pcu0;
> +
> + if (do_pcu_group == 5)
> + outp += sprintf(outp, " %12llu", p->pcu1);
> + else
> + outp += add_percent(outp, p->pcu2, pcu_cycles);
> + outp += add_percent(outp, p->pcu2, pcu_cycles);
> + outp += add_percent(outp, p->pcu3, pcu_cycles);
> + }
> done:
> outp += sprintf(outp, "\n");
>
> @@ -635,6 +674,10 @@ delta_package(struct pkg_data *new, struct pkg_data *old)
> old->pc9 = new->pc9 - old->pc9;
> old->pc10 = new->pc10 - old->pc10;
> old->pkg_temp_c = new->pkg_temp_c;
> + old->pcu0 = new->pcu0;
> + old->pcu1 = new->pcu1;
> + old->pcu2 = new->pcu2;
> + old->pcu3 = new->pcu3;
>
> DELTA_WRAP32(new->energy_pkg, old->energy_pkg);
> DELTA_WRAP32(new->energy_cores, old->energy_cores);
> @@ -783,6 +826,10 @@ void clear_counters(struct thread_data *t, struct core_data *c, struct pkg_data
> p->rapl_pkg_perf_status = 0;
> p->rapl_dram_perf_status = 0;
> p->pkg_temp_c = 0;
> + p->pcu0 = 0;
> + p->pcu1 = 0;
> + p->pcu2 = 0;
> + p->pcu3 = 0;
> }
> int sum_counters(struct thread_data *t, struct core_data *c,
> struct pkg_data *p)
> @@ -826,6 +873,11 @@ int sum_counters(struct thread_data *t, struct core_data *c,
>
> average.packages.rapl_pkg_perf_status += p->rapl_pkg_perf_status;
> average.packages.rapl_dram_perf_status += p->rapl_dram_perf_status;
> +
> + average.packages.pcu0 += p->pcu0;
> + average.packages.pcu1 += p->pcu1;
> + average.packages.pcu2 += p->pcu2;
> + average.packages.pcu3 += p->pcu3;
> return 0;
> }
> /*
> @@ -872,6 +924,137 @@ static unsigned long long rdtsc(void)
> return low | ((unsigned long long)high) << 32;
> }
>
> +/* Get PCU statistics:
> +
> + We can only measure three events at a time
> + (4 counters in the PCU, and one for the clock ticks event)
> +
> + To generate new events use the ucevent tool in pmu-tools
> + FORCECPU=cpu ucevent.py --resolve EVENT-NAME */
> +
> +static char *pcu_groups[][5] = {
> + /* Runs into problems with the uncore driver with the filters for now. */
> + [0] = { NULL },
> + /* group 1 is covered already */
> + [1] = { NULL },
> + [2] = { "uncore_pcu/event=0x0/", /* PCU.CLOCKTICKS */
> + "uncore_pcu/event=0x9/", /* PCU.PROCHOT_EXTERNAL_CYCLES */
> + "uncore_pcu/event=0xa/", /* PCU.PROCHOT_INTERNAL_CYCLES */
> + "uncore_pcu/event=0x4/", /* PCU.FREQ_MAX_LIMIT_THERMAL_CYCLES */
> + NULL },
> + [3] = { "uncore_pcu/event=0x0/", /* PCU.CLOCKTICKS */
> + "uncore_pcu/event=0x4/", /* PCU.FREQ_MAX_LIMIT_THERMAL_CYCLES */
> + "uncore_pcu/event=0x5/", /* PCU.FREQ_MAX_POWER_CYCLES */
> + "uncore_pcu/event=0x7/", /* PCU.FREQ_MAX_CURRENT_CYCLES */
> + NULL },
> + [4] = { "uncore_pcu/event=0x0/", /* PCU.CLOCKTICKS */
> + "uncore_pcu/event=0x6/", /* PCU.FREQ_MAX_OS_CYCLES */
> + "uncore_pcu/event=0x5/", /* PCU.FREQ_MAX_POWER_CYCLES */
> + "uncore_pcu/event=0x7/", /* PCU.FREQ_MAX_CURRENT_CYCLES */
> + NULL },
> + [5] = { "uncore_pcu/event=0x0/", /* PCU.CLOCKTICKS */
> + "uncore_pcu/event=0x60,edge=1/", /* PCU.FREQ_TRANS_CYCLES,edge */
> + "uncore_pcu/event=0x60/", /* PCU.FREQ_TRANS_CYCLES */
> + NULL },
> +};
> +
> +static char *pcu_group_titles[][3] = {
> + [0] = { "", "", "" },
> + [1] = { "", "", "" },
> + [2] = { "ProcHot-ext%", "ProcHot-int%", "Therm-Lim%" },
> + [3] = { "Therm-Lim%", "Power-Lim%", "Current-Lim%" },
> + [4] = { "OS-Limit%", "Power-Limit%", "Current-Lim%" },
> + [5] = { "Num-Freq-Trans", "Freq-Trans%", "" }
> +};
> +
> +#define NUM_COUNTER 4
> +
> +static void pcu_fixup_tables(void)
> +{
> + /* Table is for IVT. Fix up deltas to other CPUs */
> + if (is_hsx) {
> + pcu_groups[4][3] = NULL; /* No PCU.FREQ_MAX_CURRENT_CYCLES */
> + pcu_groups[3][3] = NULL;
> + pcu_group_titles[4][3] = NULL;
> + pcu_group_titles[3][3] = NULL;
> + } else if (is_jkt) {
> + pcu_groups[5][1] = "uncore_pcu/event=0x200000,edge=1/"; /* PCU.FREQ_TRANS_CYCLES */
> + pcu_groups[5][2] = "uncore_pcu/event=0x200000/";
> + }
> +}
> +
> +static int pcu_perf_init(int group, int cpu, int *pcu_fd)
> +{
> + int i;
> + char **evnames = pcu_groups[group];
> + pcu_fd[0] = -1;
> +
> + for (i = 0; evnames[i]; i++) {
> + struct perf_event_attr attr;
> + char *ev = evnames[i];
> +
> + if (tjevent_name_to_attr(ev, &attr) < 0) {
> + fprintf(stderr, "Cannot resolve %s\n", ev);
> + goto fallback;
> + }
> + attr.read_format = PERF_FORMAT_GROUP;
> + pcu_fd[i] = syscall(__NR_perf_event_open,
> + &attr,
> + -1,
> + cpu,
> + pcu_fd[0],
> + i == 0 ? PERF_FLAG_FD_OUTPUT : 0);
> + if (pcu_fd[i] < 0) {
> + fprintf(stderr, "cannot open perf event %s\n", ev);
> + goto fallback;
> + }
> + if (ev != evnames[i])
> + free(ev);
> + }
> + return 0;
> +fallback:
> + /* Don't error out */
> + do_pcu_group = -1;
> + while (--i >= 0) {
> + close(pcu_fd[i]);
> + pcu_fd[i] = -1;
> + }
> + return -1;
> +}
> +
> +int get_pcu_data(struct pkg_data *p)
> +{
> + static int **pcu_fds;
> + int *pcu_fd;
> + unsigned long long val[NUM_COUNTER + 3];
> +
> + if (!pcu_fds) {
> + pcu_fds = calloc(max_package_id + 1, sizeof(void *));
> + if (!pcu_fds)
> + err(1, "no memory");
> + pcu_fds[p->package_id] = calloc(NUM_COUNTER, sizeof(void *));
> + if (!pcu_fds[p->package_id])
> + err(1, "no memory");
> + }
> + pcu_fd = pcu_fds[p->package_id];
> + if (!pcu_fd[0]) {
> + if (pcu_perf_init(do_pcu_group, p->package_id, pcu_fd) < 0)
> + return 0;
> + if (!pcu_fd[0])
> + return 0;
> + }
> +
> + memset(val, 0, sizeof(val));
> + read(pcu_fd[0], val, sizeof(val));
> +
> + /* XXX scale by run time for multiplexing */
> + p->pcu0 = val[1 + 0];
> + p->pcu1 = val[1 + 1];
> + p->pcu2 = val[1 + 2];
> + p->pcu3 = val[1 + 3];
> +
> + return 0;
> +}
>
> /*
> * get_counters(...)
> @@ -1011,6 +1194,10 @@ int get_counters(struct thread_data *t, struct core_data *c, struct pkg_data *p)
> return -17;
> p->pkg_temp_c = tcc_activation_temp - ((msr >> 16) & 0x7F);
> }
> + if (do_pcu_group >= 0) {
> + if (get_pcu_data(p))
> + return -18;
> + }
> return 0;
> }
>
> @@ -2070,6 +2257,9 @@ void check_cpuid()
> do_c8_c9_c10 = has_c8_c9_c10(family, model);
> do_slm_cstates = is_slm(family, model);
> bclk = discover_bclk(family, model);
> + is_jkt = genuine_intel && model == 45;
> + is_ivt = genuine_intel && model == 62;
> + is_hsx = genuine_intel && model == 63;
>
> do_nehalem_turbo_ratio_limit = has_nehalem_turbo_ratio_limit(family, model);
> do_ivt_turbo_ratio_limit = has_ivt_turbo_ratio_limit(family, model);
> @@ -2081,7 +2271,7 @@ void check_cpuid()
>
> void usage()
> {
> - errx(1, "%s: [-v][-R][-T][-p|-P|-S][-c MSR#][-C MSR#][-m MSR#][-M MSR#][-i interval_sec | command ...]\n",
> + errx(1, "%s: [-v][-R][-T][-p|-P|-S][-c MSR#][-C MSR#][-m MSR#][-M MSR#] [-xPCUGROUP] [-i interval_sec | command ...]\n",
> progname);
> }
>
> @@ -2107,7 +2297,6 @@ void topology_probe()
> {
> int i;
> int max_core_id = 0;
> - int max_package_id = 0;
> int max_siblings = 0;
> struct cpu_topology {
> int core_id;
> @@ -2319,6 +2508,10 @@ void turbostat_init()
>
> if (verbose)
> for_all_cpus(print_thermal, ODD_COUNTERS);
> +
> + pcu_fixup_tables();
> + if (!is_ivt && !is_jkt && is_hsx)
> + do_pcu_group = -1;
> }
>
> int fork_it(char **argv)
> @@ -2388,7 +2581,7 @@ void cmdline(int argc, char **argv)
>
> progname = argv[0];
>
> - while ((opt = getopt(argc, argv, "+pPsSvi:c:C:m:M:RJT:")) != -1) {
> + while ((opt = getopt(argc, argv, "+pPsSvi:c:C:m:M:RJT:x:")) != -1) {
> switch (opt) {
> case 'p':
> show_core_only++;
> @@ -2429,6 +2622,11 @@ void cmdline(int argc, char **argv)
> case 'J':
> rapl_joules++;
> break;
> + case 'x':
> + sscanf(optarg, "%d", &do_pcu_group);
> + if (do_pcu_group < 0 || do_pcu_group > 5)
> + usage();
> + break;
>
> default:
> usage();
> --
> 1.9.3
>
--
Len Brown, Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists