[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160128152848.GT6356@twins.programming.kicks-ass.net>
Date: Thu, 28 Jan 2016 16:28:48 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Borislav Petkov <bp@...en8.de>
Cc: Huang Rui <ray.huang@....com>, Borislav Petkov <bp@...e.de>,
Ingo Molnar <mingo@...nel.org>,
Andy Lutomirski <luto@...capital.net>,
Thomas Gleixner <tglx@...utronix.de>,
Robert Richter <rric@...nel.org>,
Jacob Shin <jacob.w.shin@...il.com>,
John Stultz <john.stultz@...aro.org>,
Frédéric Weisbecker <fweisbec@...il.com>,
linux-kernel@...r.kernel.org, spg_linux_kernel@....com,
x86@...nel.org, Guenter Roeck <linux@...ck-us.net>,
Andreas Herrmann <herrmann.der.user@...glemail.com>,
Suravee Suthikulpanit <suravee.suthikulpanit@....com>,
Aravind Gopalakrishnan <Aravind.Gopalakrishnan@....com>,
Fengguang Wu <fengguang.wu@...el.com>,
Aaron Lu <aaron.lu@...el.com>
Subject: Re: [PATCH v4] perf/x86/amd/power: Add AMD accumulated power
reporting mechanism
On Thu, Jan 28, 2016 at 10:03:15AM +0100, Borislav Petkov wrote:
> +
> +struct power_pmu {
> + raw_spinlock_t lock;
Now that the list is gone, what does this thing protect?
> + struct pmu *pmu;
This member seems superfluous, there's only the one possible value.
> + local64_t cpu_sw_pwr_ptsc;
> +
> + /*
> + * These two cpumasks are used for avoiding the allocations on the
> + * CPU_STARTING phase because power_cpu_prepare() will be called with
> + * IRQs disabled.
> + */
> + cpumask_var_t mask;
> + cpumask_var_t tmp_mask;
> +};
> +
> +static struct pmu pmu_class;
> +
> +/*
> + * Accumulated power represents the sum of each compute unit's (CU) power
> + * consumption. On any core of each CU we read the total accumulated power from
> + * MSR_F15H_CU_PWR_ACCUMULATOR. cpu_mask represents CPU bit map of all cores
> + * which are picked to measure the power for the CUs they belong to.
> + */
> +static cpumask_t cpu_mask;
> +
> +static DEFINE_PER_CPU(struct power_pmu *, amd_power_pmu);
> +
> +static u64 event_update(struct perf_event *event, struct power_pmu *pmu)
> +{
Is there ever a case where @pmu != __this_cpu_read(power_pmu) ?
> + struct hw_perf_event *hwc = &event->hw;
> + u64 prev_raw_count, new_raw_count, prev_ptsc, new_ptsc;
> + u64 delta, tdelta;
> +
> +again:
> + prev_raw_count = local64_read(&hwc->prev_count);
> + prev_ptsc = local64_read(&pmu->cpu_sw_pwr_ptsc);
> + rdmsrl(event->hw.event_base, new_raw_count);
Is hw.event_base != MSR_F15H_CU_PWR_ACCUMULATOR possible?
> + rdmsrl(MSR_F15H_PTSC, new_ptsc);
Also, I suspect this doesn't do what you expect it to do.
We measure per-event PWR_ACC deltas, but per CPU PTSC values. These do
not match when there's more than 1 event on the CPU.
I would suggest adding a new struct to the hw_perf_event union with the
two u64 deltas like:
struct { /* amd_power */
u64 pwr_acc;
u64 ptsc;
};
And track these values per-event.
> +
> + if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
> + new_raw_count) != prev_raw_count) {
> + cpu_relax();
> + goto again;
> + }
> +
> + /*
> + * Calculate the CU power consumption over a time period, the unit of
> + * final value (delta) is micro-Watts. Then add it to the event count.
> + */
> + if (new_raw_count < prev_raw_count) {
> + delta = max_cu_acc_power + new_raw_count;
> + delta -= prev_raw_count;
> + } else
> + delta = new_raw_count - prev_raw_count;
> +
> + delta *= cpu_pwr_sample_ratio * 1000;
> + tdelta = new_ptsc - prev_ptsc;
> +
> + do_div(delta, tdelta);
> + local64_add(delta, &event->count);
Then this division can be redone on the total values, that looses less
precision over-all.
> +
> + return new_raw_count;
> +}
Powered by blists - more mailing lists