linux-kernel - Re: [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter support for AMD CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dff31583-adaf-4da8-954e-f35f7ef5a5d3@amd.com>
Date: Wed, 26 Jun 2024 22:07:32 +0530
From: Dhananjay Ugwekar <Dhananjay.Ugwekar@....com>
To: "Zhang, Rui" <rui.zhang@...el.com>,
 "alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>,
 "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
 "Hunter, Adrian" <adrian.hunter@...el.com>,
 "mingo@...hat.com" <mingo@...hat.com>,
 "irogers@...gle.com" <irogers@...gle.com>,
 "tglx@...utronix.de" <tglx@...utronix.de>,
 "gustavoars@...nel.org" <gustavoars@...nel.org>,
 "kan.liang@...ux.intel.com" <kan.liang@...ux.intel.com>,
 "kees@...nel.org" <kees@...nel.org>,
 "mark.rutland@....com" <mark.rutland@....com>,
 "peterz@...radead.org" <peterz@...radead.org>, "bp@...en8.de"
 <bp@...en8.de>, "acme@...nel.org" <acme@...nel.org>,
 "oleksandr@...alenko.name" <oleksandr@...alenko.name>,
 "jolsa@...nel.org" <jolsa@...nel.org>, "x86@...nel.org" <x86@...nel.org>,
 "namhyung@...nel.org" <namhyung@...nel.org>
Cc: "ravi.bangoria@....com" <ravi.bangoria@....com>,
 "kprateek.nayak@....com" <kprateek.nayak@....com>,
 "gautham.shenoy@....com" <gautham.shenoy@....com>,
 "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "linux-hardening@...r.kernel.org" <linux-hardening@...r.kernel.org>,
 "sandipan.das@....com" <sandipan.das@....com>,
 "ananth.narayan@....com" <ananth.narayan@....com>,
 "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>
Subject: Re: [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter
 support for AMD CPUs

Hello Rui,

On 6/26/2024 8:48 PM, Zhang, Rui wrote:
> 
>> @@ -131,8 +146,10 @@ enum rapl_unit_quirk {
>>  };
>>  
>>  struct rapl_model {
>> -       struct perf_msr *rapl_msrs;
>> +       struct perf_msr *rapl_pkg_msrs;
> 
> IMO, this should be part of patch 8/10.

Makes sense, better to move all the renaming code to 8th patch.

> 
> [...]
> 
>> @@ -685,6 +774,13 @@ static void __init rapl_advertise(void)
>>                                 rapl_pkg_domain_names[i],
>> rapl_hw_unit[i]);
>>                 }
>>         }
>> +
>> +       for (i = 0; i < NR_RAPL_CORE_DOMAINS; i++) {
>> +               if (rapl_core_cntr_mask & (1 << i)) {
>> +                       pr_info("hw unit of domain %s 2^-%d
>> Joules\n",
>> +                               rapl_core_domain_names[i],
>> rapl_hw_unit[i]);
> 
> rapl_hw_unit[] is for package pmu only and
> rapl_hw_unit[0] is rapl_hw_unit[PERF_RAPL_PP0] rather than
> rapl_hw_unit[PERF_RAPL_PER_CORE]
> 
> you cannot use rapl_hw_unit[i] to represent per-core rapl domain unit.

Yes right, I saw that all the elements in the rapl_hw_unit array were actually 
using the value from the same register "MSR_RAPL_POWER_UNIT" or "MSR_AMD_RAPL_POWER_UNIT".
Except for the two quirks,
 
 737         case RAPL_UNIT_QUIRK_INTEL_HSW:                                                                                                                                                                      
 738                 rapl_hw_unit[PERF_RAPL_RAM] = 16;                                                                                                                                                            
 739                 break;                                                                                                                                                                                       
 740         /* SPR uses a fixed energy unit for Psys domain. */
 741         case RAPL_UNIT_QUIRK_INTEL_SPR:
 742                 rapl_hw_unit[PERF_RAPL_PSYS] = 0;
 743                 break;

So, as for AMD systems the rapl_hw_unit[] elements will always have the same value, I ended 
up using the rapl_hw_unit[PERF_RAPL_PP0] for rapl_hw_unit[PERF_RAPL_PER_CORE], but I do realize
it is quite hacky. So, better to do it cleanly and add a separate array/variable for the core events.

> 
>> +               }
>> +       }
>>  }
>>  
>>  static void cleanup_rapl_pmus(struct rapl_pmus *rapl_pmus)
>> @@ -705,15 +801,16 @@ static const struct attribute_group
>> *rapl_attr_update[] = {
>>         NULL,
>>  };
>>  
>> -static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr)
>> +static const struct attribute_group *rapl_per_core_attr_update[] = {
>> +       &rapl_events_per_core_group,
>> +};
>> +
>> +static int __init init_rapl_pmus(struct rapl_pmus **rapl_pmus_ptr,
>> int nr_rapl_pmu,
>> +                                const struct attribute_group
>> **rapl_attr_groups,
>> +                                const struct attribute_group
>> **rapl_attr_update)
>>  {
>>         struct rapl_pmus *rapl_pmus;
>>  
>> -       int nr_rapl_pmu = topology_max_packages() *
>> topology_max_dies_per_package();
>> -
>> -       if (rapl_pmu_is_pkg_scope())
>> -               nr_rapl_pmu = topology_max_packages();
>> -
>>         rapl_pmus = kzalloc(struct_size(rapl_pmus, rapl_pmu,
>> nr_rapl_pmu), GFP_KERNEL);
>>         if (!rapl_pmus)
>>                 return -ENOMEM;
>> @@ -741,7 +838,7 @@ static struct rapl_model model_snb = {
>>                           BIT(PERF_RAPL_PKG) |
>>                           BIT(PERF_RAPL_PP1),
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>>  };
>>  
>>  static struct rapl_model model_snbep = {
>> @@ -749,7 +846,7 @@ static struct rapl_model model_snbep = {
>>                           BIT(PERF_RAPL_PKG) |
>>                           BIT(PERF_RAPL_RAM),
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>>  };
>>  
>>  static struct rapl_model model_hsw = {
>> @@ -758,7 +855,7 @@ static struct rapl_model model_hsw = {
>>                           BIT(PERF_RAPL_RAM) |
>>                           BIT(PERF_RAPL_PP1),
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>>  };
>>  
>>  static struct rapl_model model_hsx = {
>> @@ -767,7 +864,7 @@ static struct rapl_model model_hsx = {
>>                           BIT(PERF_RAPL_RAM),
>>         .unit_quirk     = RAPL_UNIT_QUIRK_INTEL_HSW,
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>>  };
>>  
>>  static struct rapl_model model_knl = {
>> @@ -775,7 +872,7 @@ static struct rapl_model model_knl = {
>>                           BIT(PERF_RAPL_RAM),
>>         .unit_quirk     = RAPL_UNIT_QUIRK_INTEL_HSW,
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>>  };
>>  
>>  static struct rapl_model model_skl = {
>> @@ -785,7 +882,7 @@ static struct rapl_model model_skl = {
>>                           BIT(PERF_RAPL_PP1) |
>>                           BIT(PERF_RAPL_PSYS),
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_msrs,
>>  };
>>  
>>  static struct rapl_model model_spr = {
>> @@ -795,13 +892,15 @@ static struct rapl_model model_spr = {
>>                           BIT(PERF_RAPL_PSYS),
>>         .unit_quirk     = RAPL_UNIT_QUIRK_INTEL_SPR,
>>         .msr_power_unit = MSR_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = intel_rapl_spr_msrs,
>> +       .rapl_pkg_msrs  = intel_rapl_spr_msrs,
>>  };
> 
> All the above renaming code should be in patch 8/10.
> Or else it is a distraction for reviewing this patch.

Agreed, will move it in the next version.

> 
>>  
>>  static struct rapl_model model_amd_hygon = {
>>         .pkg_events     = BIT(PERF_RAPL_PKG),
>> +       .core_events    = BIT(PERF_RAPL_PER_CORE),
>>         .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
>> -       .rapl_msrs      = amd_rapl_pkg_msrs,
>> +       .rapl_pkg_msrs  = amd_rapl_pkg_msrs,
>> +       .rapl_core_msrs = amd_rapl_core_msrs,
>>  };
>>  
>>  static const struct x86_cpu_id rapl_model_match[] __initconst = {
>> @@ -858,6 +957,11 @@ static int __init rapl_pmu_init(void)
>>  {
>>         const struct x86_cpu_id *id;
>>         int ret;
>> +       int nr_rapl_pmu = topology_max_packages() *
>> topology_max_dies_per_package();
>> +       int nr_cores = topology_max_packages() *
>> topology_num_cores_per_package();
> 
> I'd suggest either using two variables nr_pkgs/nr_cores, or reuse one
> variable nr_rapl_pmu for both pkg pmu and per-core pmu.

I understand your point, but the problem with that is, there are actually three scopes needed here

Some Intel systems need a *die* scope for the rapl_pmus_pkg PMU
Some Intel systems and all AMD systems need a *package* scope for the rapl_pmus_pkg PMU
And AMD systems need a *core* scope for the rapl_pmus_per_core PMU

I think what we can do is three variables, nr_dies (for all Intel systems as before), 
nr_pkgs(for AMD systems rapl_pmus_pkg PMU) and nr_cores(for rapl_pmus_per_core PMU)

Sounds good?

> 
>> +
>> +       if (rapl_pmu_is_pkg_scope())
>> +               nr_rapl_pmu = topology_max_packages();
>>  
>>         id = x86_match_cpu(rapl_model_match);
>>         if (!id)
>> @@ -865,17 +969,34 @@ static int __init rapl_pmu_init(void)
>>  
>>         rapl_model = (struct rapl_model *) id->driver_data;
>>  
>> -       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model->rapl_msrs,
>> PERF_RAPL_PKG_EVENTS_MAX,
>> +       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model-
>>> rapl_pkg_msrs, PERF_RAPL_PKG_EVENTS_MAX,
>>                                         false, (void *) &rapl_model-
>>> pkg_events);
>>  
>>         ret = rapl_check_hw_unit();
>>         if (ret)
>>                 return ret;
>>  
>> -       ret = init_rapl_pmus(&rapl_pmus_pkg);
>> +       ret = init_rapl_pmus(&rapl_pmus_pkg, nr_rapl_pmu,
>> rapl_attr_groups, rapl_attr_update);
>>         if (ret)
>>                 return ret;
>>  
>> +       if (rapl_model->core_events) {
>> +               rapl_core_cntr_mask = perf_msr_probe(rapl_model-
>>> rapl_core_msrs,
>> +                                                   
>> PERF_RAPL_CORE_EVENTS_MAX, false,
>> +                                                    (void *)
>> &rapl_model->core_events);
>> +
>> +               ret = init_rapl_pmus(&rapl_pmus_core, nr_cores,
>> +                                    rapl_per_core_attr_groups,
>> rapl_per_core_attr_update);
>> +               if (ret) {
>> +                       /*
>> +                        * If initialization of per_core PMU fails,
>> reset per_core
>> +                        * flag, and continue with power PMU
>> initialization.
>> +                        */
>> +                       pr_warn("Per-core PMU initialization failed
>> (%d)\n", ret);
>> +                       rapl_model->core_events = 0UL;
>> +               }
>> +       }
>> +
>>         /*
>>          * Install callbacks. Core will call them for each online
>> cpu.
>>          */
>> @@ -889,6 +1010,20 @@ static int __init rapl_pmu_init(void)
>>         if (ret)
>>                 goto out1;
>>  
>> +       if (rapl_model->core_events) {
>> +               ret = perf_pmu_register(&rapl_pmus_core->pmu,
>> "power_per_core", -1);
>> +               if (ret) {
>> +                       /*
>> +                        * If registration of per_core PMU fails,
>> cleanup per_core PMU
>> +                        * variables, reset the per_core flag and
>> keep the
>> +                        * power PMU untouched.
>> +                        */
>> +                       pr_warn("Per-core PMU registration failed
>> (%d)\n", ret);
>> +                       cleanup_rapl_pmus(rapl_pmus_core);
>> +                       rapl_model->core_events = 0UL;
>> +               }
>> +       }
>> +
>>         rapl_advertise();
>>         return 0;
>>  
>> @@ -906,5 +1041,9 @@ static void __exit intel_rapl_exit(void)
>>         cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE);
>>         perf_pmu_unregister(&rapl_pmus_pkg->pmu);
>>         cleanup_rapl_pmus(rapl_pmus_pkg);
>> +       if (rapl_model->core_events) {
>> +               perf_pmu_unregister(&rapl_pmus_core->pmu);
>> +               cleanup_rapl_pmus(rapl_pmus_core);
>> +       }
> 
> we do check rapl_pmus_core before accessing it, but we never check
> rapl_pmus_pkg because the previous code assumes it always exists.
> 
> so could there be a problem if some one starts the per-core pmu when
> pkg pmu is unregistered and cleaned up?
> 
> say, in rapl_pmu_event_init(),
> 
> if (event->attr.type == rapl_pmus_pkg->pmu.type ||
>    (rapl_pmus_core && event->attr.type == rapl_pmus_core->pmu.type))
> 
> this can break because rapl_pmus_pkg is freed, right?

Hmm, I think this situation can't arise as whenever the power PMU fails, we 
directly go to the failure path and dont setup the per-core PMU(which means 
no one will be able to start the per-core PMU), 
Please let me know if there is a scenario where this assumption can fail.

Thanks for all the helpful suggestions!, will incorporate them in v4.

Regards,
Dhananjay

> 
> thanks,
> rui
> 
> 
>>  }
>>  module_exit(intel_rapl_exit);
>