linux-kernel - Re: [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter support for AMD CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a33b64e1-51bc-4c71-b342-7f1e4257e2f4@amd.com>
Date: Thu, 27 Jun 2024 16:43:08 +0530
From: Dhananjay Ugwekar <Dhananjay.Ugwekar@....com>
To: "Zhang, Rui" <rui.zhang@...el.com>,
 "alexander.shishkin@...ux.intel.com" <alexander.shishkin@...ux.intel.com>,
 "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
 "Hunter, Adrian" <adrian.hunter@...el.com>,
 "oleksandr@...alenko.name" <oleksandr@...alenko.name>,
 "mingo@...hat.com" <mingo@...hat.com>,
 "irogers@...gle.com" <irogers@...gle.com>,
 "tglx@...utronix.de" <tglx@...utronix.de>,
 "gustavoars@...nel.org" <gustavoars@...nel.org>,
 "kan.liang@...ux.intel.com" <kan.liang@...ux.intel.com>,
 "kees@...nel.org" <kees@...nel.org>,
 "mark.rutland@....com" <mark.rutland@....com>,
 "peterz@...radead.org" <peterz@...radead.org>, "bp@...en8.de"
 <bp@...en8.de>, "acme@...nel.org" <acme@...nel.org>,
 "jolsa@...nel.org" <jolsa@...nel.org>, "x86@...nel.org" <x86@...nel.org>,
 "namhyung@...nel.org" <namhyung@...nel.org>
Cc: "ravi.bangoria@....com" <ravi.bangoria@....com>,
 "gautham.shenoy@....com" <gautham.shenoy@....com>,
 "kprateek.nayak@....com" <kprateek.nayak@....com>,
 "linux-perf-users@...r.kernel.org" <linux-perf-users@...r.kernel.org>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "linux-hardening@...r.kernel.org" <linux-hardening@...r.kernel.org>,
 "ananth.narayan@....com" <ananth.narayan@....com>,
 "sandipan.das@....com" <sandipan.das@....com>,
 "linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>
Subject: Re: [PATCH v3 10/10] perf/x86/rapl: Add per-core energy counter
 support for AMD CPUs

Hello Rui,

On 6/27/2024 12:19 PM, Zhang, Rui wrote:
> Hi, Dhananjay
>>
>>>
>>> [...]
>>>
>>>> @@ -685,6 +774,13 @@ static void __init rapl_advertise(void)
>>>>                                 rapl_pkg_domain_names[i],
>>>> rapl_hw_unit[i]);
>>>>                 }
>>>>         }
>>>> +
>>>> +       for (i = 0; i < NR_RAPL_CORE_DOMAINS; i++) {
>>>> +               if (rapl_core_cntr_mask & (1 << i)) {
>>>> +                       pr_info("hw unit of domain %s 2^-%d
>>>> Joules\n",
>>>> +                               rapl_core_domain_names[i],
>>>> rapl_hw_unit[i]);
>>>
>>> rapl_hw_unit[] is for package pmu only and
>>> rapl_hw_unit[0] is rapl_hw_unit[PERF_RAPL_PP0] rather than
>>> rapl_hw_unit[PERF_RAPL_PER_CORE]
>>>
>>> you cannot use rapl_hw_unit[i] to represent per-core rapl domain
>>> unit.
>>
>> Yes right, I saw that all the elements in the rapl_hw_unit array were
>> actually 
>> using the value from the same register "MSR_RAPL_POWER_UNIT" or
>> "MSR_AMD_RAPL_POWER_UNIT".
>> Except for the two quirks,
>>  
>>  737         case
>> RAPL_UNIT_QUIRK_INTEL_HSW:                                           
>>                                                                      
>>                                                       
>>  738                 rapl_hw_unit[PERF_RAPL_RAM] =
>> 16;                                                                  
>>                                                                      
>>                      
>>  739                
>> break;                                                               
>>                                                                      
>>                                                    
>>  740         /* SPR uses a fixed energy unit for Psys domain. */
>>  741         case RAPL_UNIT_QUIRK_INTEL_SPR:
>>  742                 rapl_hw_unit[PERF_RAPL_PSYS] = 0;
>>  743                 break;
>>
>> So, as for AMD systems the rapl_hw_unit[] elements will always have
>> the same value, I ended 
>> up using the rapl_hw_unit[PERF_RAPL_PP0] for
>> rapl_hw_unit[PERF_RAPL_PER_CORE], but I do realize
>> it is quite hacky. So, better to do it cleanly and add a separate
>> array/variable for the core events.
>>
> yeah, that is much better.
>>
> 
>>>
>>>>  
>>>>  static struct rapl_model model_amd_hygon = {
>>>>         .pkg_events     = BIT(PERF_RAPL_PKG),
>>>> +       .core_events    = BIT(PERF_RAPL_PER_CORE),
>>>>         .msr_power_unit = MSR_AMD_RAPL_POWER_UNIT,
>>>> -       .rapl_msrs      = amd_rapl_pkg_msrs,
>>>> +       .rapl_pkg_msrs  = amd_rapl_pkg_msrs,
>>>> +       .rapl_core_msrs = amd_rapl_core_msrs,
>>>>  };
>>>>  
>>>>  static const struct x86_cpu_id rapl_model_match[] __initconst =
>>>> {
>>>> @@ -858,6 +957,11 @@ static int __init rapl_pmu_init(void)
>>>>  {
>>>>         const struct x86_cpu_id *id;
>>>>         int ret;
>>>> +       int nr_rapl_pmu = topology_max_packages() *
>>>> topology_max_dies_per_package();
>>>> +       int nr_cores = topology_max_packages() *
>>>> topology_num_cores_per_package();
>>>
>>> I'd suggest either using two variables nr_pkgs/nr_cores, or reuse
>>> one
>>> variable nr_rapl_pmu for both pkg pmu and per-core pmu.
>>
>> I understand your point, but the problem with that is, there are
>> actually three scopes needed here
>>
>> Some Intel systems need a *die* scope for the rapl_pmus_pkg PMU
>> Some Intel systems and all AMD systems need a *package* scope for the
>> rapl_pmus_pkg PMU
>> And AMD systems need a *core* scope for the rapl_pmus_per_core PMU
>>
>> I think what we can do is three variables, nr_dies (for all Intel
>> systems as before), 
>> nr_pkgs(for AMD systems rapl_pmus_pkg PMU)
> 
> Not necessarily, we already uses rapl_pmus_pkg for intel systems,
> right?

Yes, but scope of rapl_pmus_pkg is actually *die* for some Intel systems
(Xeon Cascade Lake-AP to be specific), right?

Whereas, all AMD systems have pkg scope for power PMU.

> 
>>  and nr_cores(for rapl_pmus_per_core PMU)
>>
>> Sounds good?
> 
> what about just one variable "count" and reuse it for every cases?

Sure that would be cleaner, albeit not that explicit, will make the change 
in next version

> 
>>
>>>
>>>> +
>>>> +       if (rapl_pmu_is_pkg_scope())
>>>> +               nr_rapl_pmu = topology_max_packages();
>>>>  
>>>>         id = x86_match_cpu(rapl_model_match);
>>>>         if (!id)
>>>> @@ -865,17 +969,34 @@ static int __init rapl_pmu_init(void)
>>>>  
>>>>         rapl_model = (struct rapl_model *) id->driver_data;
>>>>  
>>>> -       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model-
>>>>> rapl_msrs,
>>>> PERF_RAPL_PKG_EVENTS_MAX,
>>>> +       rapl_pkg_cntr_mask = perf_msr_probe(rapl_model-
>>>>> rapl_pkg_msrs, PERF_RAPL_PKG_EVENTS_MAX,
>>>>                                         false, (void *)
>>>> &rapl_model-
>>>>> pkg_events);
>>>>  
>>>>         ret = rapl_check_hw_unit();
>>>>         if (ret)
>>>>                 return ret;
>>>>  
>>>> -       ret = init_rapl_pmus(&rapl_pmus_pkg);
>>>> +       ret = init_rapl_pmus(&rapl_pmus_pkg, nr_rapl_pmu,
>>>> rapl_attr_groups, rapl_attr_update);
>>>>         if (ret)
>>>>                 return ret;
>>>>  
>>>> +       if (rapl_model->core_events) {
>>>> +               rapl_core_cntr_mask = perf_msr_probe(rapl_model-
>>>>> rapl_core_msrs,
>>>> +                                                   
>>>> PERF_RAPL_CORE_EVENTS_MAX, false,
>>>> +                                                    (void *)
>>>> &rapl_model->core_events);
>>>> +
>>>> +               ret = init_rapl_pmus(&rapl_pmus_core, nr_cores,
>>>> +                                    rapl_per_core_attr_groups,
>>>> rapl_per_core_attr_update);
>>>> +               if (ret) {
>>>> +                       /*
>>>> +                        * If initialization of per_core PMU
>>>> fails,
>>>> reset per_core
>>>> +                        * flag, and continue with power PMU
>>>> initialization.
>>>> +                        */
>>>> +                       pr_warn("Per-core PMU initialization
>>>> failed
>>>> (%d)\n", ret);
>>>> +                       rapl_model->core_events = 0UL;
>>>> +               }
>>>> +       }
>>>> +
>>>>         /*
>>>>          * Install callbacks. Core will call them for each online
>>>> cpu.
>>>>          */
>>>> @@ -889,6 +1010,20 @@ static int __init rapl_pmu_init(void)
>>>>         if (ret)
>>>>                 goto out1;
>>>>  
>>>> +       if (rapl_model->core_events) {
>>>> +               ret = perf_pmu_register(&rapl_pmus_core->pmu,
>>>> "power_per_core", -1);
>>>> +               if (ret) {
>>>> +                       /*
>>>> +                        * If registration of per_core PMU fails,
>>>> cleanup per_core PMU
>>>> +                        * variables, reset the per_core flag and
>>>> keep the
>>>> +                        * power PMU untouched.
>>>> +                        */
>>>> +                       pr_warn("Per-core PMU registration failed
>>>> (%d)\n", ret);
>>>> +                       cleanup_rapl_pmus(rapl_pmus_core);
>>>> +                       rapl_model->core_events = 0UL;
>>>> +               }
>>>> +       }
>>>> +
>>>>         rapl_advertise();
>>>>         return 0;
>>>>  
>>>> @@ -906,5 +1041,9 @@ static void __exit intel_rapl_exit(void)
>>>>         cpuhp_remove_state_nocalls(CPUHP_AP_PERF_X86_RAPL_ONLINE)
>>>> ;
>>>>         perf_pmu_unregister(&rapl_pmus_pkg->pmu);
>>>>         cleanup_rapl_pmus(rapl_pmus_pkg);
>>>> +       if (rapl_model->core_events) {
>>>> +               perf_pmu_unregister(&rapl_pmus_core->pmu);
>>>> +               cleanup_rapl_pmus(rapl_pmus_core);
>>>> +       }
>>>
>>> we do check rapl_pmus_core before accessing it, but we never check
>>> rapl_pmus_pkg because the previous code assumes it always exists.
>>>
>>> so could there be a problem if some one starts the per-core pmu
>>> when
>>> pkg pmu is unregistered and cleaned up?
>>>
>>> say, in rapl_pmu_event_init(),
>>>
>>> if (event->attr.type == rapl_pmus_pkg->pmu.type ||
>>>    (rapl_pmus_core && event->attr.type == rapl_pmus_core-
>>>> pmu.type))
>>>
>>> this can break because rapl_pmus_pkg is freed, right?
>>
>> Hmm, I think this situation can't arise as whenever the power PMU
>> fails, we 
>> directly go to the failure path and dont setup the per-core PMU(which
>> means 
>> no one will be able to start the per-core PMU), 
>> Please let me know if there is a scenario where this assumption can
>> fail.
> 
> I mean if we do module unload and access power-per-core pmu at the same
> time, could there be a race?
> 
> why not just unregister and cleanup the per-core pmu before the pkg
> pmu?

Yes, better to be safe, will reorder the cleanup.

Thanks,
Dhananjay

>>
> 
> thanks,
> rui
>