linux-kernel - Re: [PATCH 6/6] perf/x86/rapl: Add per-core energy counter support for AMD CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f98c662d-9270-4974-a12b-60ea995d0aa6@amd.com>
Date: Thu, 13 Jun 2024 12:09:47 +0530
From: Dhananjay Ugwekar <Dhananjay.Ugwekar@....com>
To: "Zhang, Rui" <rui.zhang@...el.com>,
 "alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>,
 "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
 "Hunter, Adrian" <adrian.hunter@...el.com>,
 "mingo@...hat.com" <mingo@...hat.com>,
 "irogers@...gle.com" <irogers@...gle.com>,
 "tglx@...utronix.de" <tglx@...utronix.de>,
 "gustavoars@...nel.org" <gustavoars@...nel.org>,
 "kan.liang@...ux.intel.com" <kan.liang@...ux.intel.com>,
 "kees@...nel.org" <kees@...nel.org>,
 "mark.rutland@....com" <mark.rutland@....com>,
 "peterz@...radead.org" <peterz@...radead.org>, "bp@...en8.de"
 <bp@...en8.de>, "acme@...nel.org" <acme@...nel.org>,
 "jolsa@...nel.org" <jolsa@...nel.org>, "x86@...nel.org" <x86@...nel.org>,
 "namhyung@...nel.org" <namhyung@...nel.org>
Cc: "ravi.bangoria@....com" <ravi.bangoria@....com>,
 "kprateek.nayak@....com" <kprateek.nayak@....com>,
 "gautham.shenoy@....com" <gautham.shenoy@....com>,
 "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "linux-hardening@...r.kernel.org" <linux-hardening@...r.kernel.org>,
 "sandipan.das@....com" <sandipan.das@....com>,
 "ananth.narayan@....com" <ananth.narayan@....com>,
 "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>
Subject: Re: [PATCH 6/6] perf/x86/rapl: Add per-core energy counter support
 for AMD CPUs

Hi Rui,

On 6/11/2024 2:00 PM, Zhang, Rui wrote:
>> @@ -345,9 +353,14 @@ static int rapl_pmu_event_init(struct perf_event
>> *event)
>>         u64 cfg = event->attr.config & RAPL_EVENT_MASK;
>>         int bit, ret = 0;
>>         struct rapl_pmu *rapl_pmu;
>> +       struct rapl_pmus *curr_rapl_pmus;
>>  
>>         /* only look at RAPL events */
>> -       if (event->attr.type != rapl_pmus->pmu.type)
>> +       if (event->attr.type == rapl_pmus->pmu.type)
>> +               curr_rapl_pmus = rapl_pmus;
>> +       else if (rapl_pmus_per_core && event->attr.type ==
>> rapl_pmus_per_core->pmu.type)
>> +               curr_rapl_pmus = rapl_pmus_per_core;
>> +       else
>>                 return -ENOENT;
> 
> can we use container_of(event->pmu, struct rapl_pmus, pmu)?

Yes! that would be cleaner, will add it in next version.

> 
>>  
>>         /* check only supported bits are set */
>> @@ -374,9 +387,14 @@ static int rapl_pmu_event_init(struct perf_event
>> *event)
>>                 return -EINVAL;
>>  
>>         /* must be done before validate_group */
>> -       rapl_pmu = cpu_to_rapl_pmu(event->cpu);
>> +       if (curr_rapl_pmus == rapl_pmus_per_core)
>> +               rapl_pmu = curr_rapl_pmus-
>>> rapl_pmu[topology_core_id(event->cpu)];
>> +       else
>> +               rapl_pmu = curr_rapl_pmus-
>>> rapl_pmu[get_rapl_pmu_idx(event->cpu)];
>> +
>>         if (!rapl_pmu)
>>                 return -EINVAL;
> 
> Current code has PERF_EV_CAP_READ_ACTIVE_PKG flag set.
> Can you help me understand why it does not affect the new per-core pmu?

Good question, I went back and looked thru the code, it turns out that we 
are not going thru the code path that checks this flag and decides whether 
to run on the local cpu(cpu on which perf is running) or the event->cpu.

So, having or not having this flag doesnt make a difference here, I did a 
small experiment for this. 

On a single package system, any core should be able to read the energy-pkg 
RAPL MSR and return the value, so there would be no need for a smp call to 
the event->cpu, but if we look thru the ftrace below we can see that only 
core 0 executes the pmu event even though we launched the perf stat for 
core 1.

--------------------------------------------------------------------------

root@...tadru:/sys/kernel/tracing# perf stat -C 1 -e power/energy-pkg/ -- dd if=/dev/zero of=/dev/null bs=1M count=100000
100000+0 records in
100000+0 records out
104857600000 bytes (105 GB, 98 GiB) copied, 2.03295 s, 51.6 GB/s

 Performance counter stats for 'CPU(s) 1':

            231.59 Joules power/energy-pkg/

       2.033916467 seconds time elapsed

root@...tadru:/sys/kernel/tracing# echo 0 > tracing_on
root@...tadru:/sys/kernel/tracing# cat trace
# tracer: function
#
# entries-in-buffer/entries-written: 12/12   #P:192
#
#                                _-----=> irqs-off/BH-disabled
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| / _-=> migrate-disable
#                              |||| /     delay
#           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
#              | |         |   |||||     |         |
            perf-3309    [096] ...1.  3422.558183: rapl_get_attr_cpumask <-dev_attr_show
            perf-3309    [001] ...1.  3422.559436: rapl_pmu_event_init <-perf_try_init_event
            perf-3309    [001] ...1.  3422.559441: rapl_pmu_event_init <-perf_try_init_event
            perf-3309    [001] ...1.  3422.559449: rapl_pmu_event_init <-perf_try_init_event
            perf-3309    [001] ...1.  3422.559537: smp_call_function_single <-event_function_call	<-- smp call to the event owner cpu(i.e. CPU0)
          <idle>-0       [000] d.h3.  3422.559544: rapl_pmu_event_add <-event_sched_in			<-- CPU# column changed to 0
          <idle>-0       [000] d.h4.  3422.559545: __rapl_pmu_event_start <-rapl_pmu_event_add
            perf-3309    [001] ...1.  3424.593398: smp_call_function_single <-event_function_call	<-- smp call to the event owner cpu(i.e. CPU0)
          <idle>-0       [000] d.h3.  3424.593403: rapl_pmu_event_del <-event_sched_out			<-- CPU# column changed to 0
          <idle>-0       [000] d.h3.  3424.593403: rapl_pmu_event_stop <-rapl_pmu_event_del
          <idle>-0       [000] d.h4.  3424.593404: rapl_event_update.isra.0 <-rapl_pmu_event_stop
            perf-3309    [001] ...1.  3424.593514: smp_call_function_single <-event_function_call

--------------------------------------------------------------------------

So, as we always use the event->cpu to run the event, the per-core PMU
is not being affected by this flag.

Anyway in next version, I will only selectively enable this flag for 
package scope events. But we will need to look into fixing this 
ineffective flag. 

> 
>> +
>>         event->cpu = rapl_pmu->cpu;
>>         event->pmu_private = rapl_pmu;
>>         event->hw.event_base = rapl_msrs[bit].msr;
>> @@ -408,17 +426,38 @@ static struct attribute_group
>> rapl_pmu_attr_group = {
>>         .attrs = rapl_pmu_attrs,
>>  };
>>  
>> +static ssize_t rapl_get_attr_per_core_cpumask(struct device *dev,
>> +                                            struct device_attribute
>> *attr, char *buf)
>> +{
>> +       return cpumap_print_to_pagebuf(true, buf,
>> &rapl_pmus_per_core->cpumask);
>> +}
>> +
>> +static struct device_attribute dev_attr_per_core_cpumask =
>> __ATTR(cpumask, 0444,
>> +                                                               
>> rapl_get_attr_per_core_cpumask,
>> +                                                               
>> NULL);
> 
> DEVICE_ATTR

I was not able to use DEVICE_ATTR, because there is already a "device_attribute dev_attr_cpumask_name" 
created for package PMU cpumask using DEVICE_ATTR(). 
So I had to create a "device_attribute dev_attr_per_core_cpumask" manually 
to avoid variable name clash.

> 
>> +
>> +static struct attribute *rapl_pmu_per_core_attrs[] = {
>> +       &dev_attr_per_core_cpumask.attr,
>> +       NULL,
>> +};
>> +
>> +static struct attribute_group rapl_pmu_per_core_attr_group = {
>> +       .attrs = rapl_pmu_per_core_attrs,
>> +};
>> +
>>  RAPL_EVENT_ATTR_STR(energy-cores, rapl_cores, "event=0x01");
>>  RAPL_EVENT_ATTR_STR(energy-pkg  ,   rapl_pkg, "event=0x02");
>>  RAPL_EVENT_ATTR_STR(energy-ram  ,   rapl_ram, "event=0x03");
>>  RAPL_EVENT_ATTR_STR(energy-gpu  ,   rapl_gpu, "event=0x04");
>>  RAPL_EVENT_ATTR_STR(energy-psys,   rapl_psys, "event=0x05");
>> +RAPL_EVENT_ATTR_STR(energy-per-core,   rapl_per_core, "event=0x06");
> 
> energy-per-core is for a separate pmu, so the event id does not need to
> be 6. The same applies to PERF_RAPL_PERCORE.

Correct, will fix in next version.

> 
>>  
>>  static struct rapl_model model_amd_hygon = {
>> -       .events         = BIT(PERF_RAPL_PKG),
>> +       .events         = BIT(PERF_RAPL_PKG) |
>> +                         BIT(PERF_RAPL_PERCORE),
>>         .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
>>         .rapl_msrs      = amd_rapl_msrs,
>> +       .per_core = true,
>>  };
> 
> can we use bit PERF_RAPL_PERCORE to check per_core pmu suppot?

Makes sense, will modify.

> 
> Just FYI, arch/x86/events/intel/cstate.c handles package/module/core
> scope cstate pmus. It uses a different approach in the probing part,
> which IMO is clearer.

Yes, I went thru it, I see that separate variables are being used to 
mark the valid events for package and core scope and a wrapper fn around 
perf_msr_probe is created, will see if that will make sense here as well.

Thanks for the review,
Dhananjay

> 
> thanks,
> rui
>