lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9248ff6-6138-46b6-9cb7-a40442882195@linux.intel.com>
Date: Fri, 1 Mar 2024 12:26:55 -0500
From: "Liang, Kan" <kan.liang@...ux.intel.com>
To: Ian Rogers <irogers@...gle.com>
Cc: Perry Taylor <perry.taylor@...el.com>,
 Samantha Alt <samantha.alt@...el.com>,
 Caleb Biggers <caleb.biggers@...el.com>, Weilin Wang
 <weilin.wang@...el.com>, Edward Baker <edward.baker@...el.com>,
 Andi Kleen <ak@...ux.intel.com>, Peter Zijlstra <peterz@...radead.org>,
 Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>,
 Namhyung Kim <namhyung@...nel.org>, Mark Rutland <mark.rutland@....com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>,
 John Garry <john.g.garry@...cle.com>, Jing Zhang
 <renyu.zj@...ux.alibaba.com>, Thomas Richter <tmricht@...ux.ibm.com>,
 James Clark <james.clark@....com>, linux-kernel@...r.kernel.org,
 linux-perf-users@...r.kernel.org, Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH v1 04/20] perf jevents: Add tsx metric group for Intel
 models



On 2024-03-01 11:37 a.m., Ian Rogers wrote:
> On Fri, Mar 1, 2024 at 6:52 AM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>>
>>
>>
>> On 2024-02-29 8:01 p.m., Ian Rogers wrote:
>>> On Thu, Feb 29, 2024 at 1:15 PM Liang, Kan <kan.liang@...ux.intel.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2024-02-28 7:17 p.m., Ian Rogers wrote:
>>>>> Allow duplicated metric to be dropped from json files.
>>>>>
>>>>> Signed-off-by: Ian Rogers <irogers@...gle.com>
>>>>> ---
>>>>>  tools/perf/pmu-events/intel_metrics.py | 51 ++++++++++++++++++++++++++
>>>>>  1 file changed, 51 insertions(+)
>>>>>
>>>>> diff --git a/tools/perf/pmu-events/intel_metrics.py b/tools/perf/pmu-events/intel_metrics.py
>>>>> index 20c25d142f24..1096accea2aa 100755
>>>>> --- a/tools/perf/pmu-events/intel_metrics.py
>>>>> +++ b/tools/perf/pmu-events/intel_metrics.py
>>>>> @@ -7,6 +7,7 @@ import argparse
>>>>>  import json
>>>>>  import math
>>>>>  import os
>>>>> +from typing import Optional
>>>>>
>>>>>  parser = argparse.ArgumentParser(description="Intel perf json generator")
>>>>>  parser.add_argument("-metricgroups", help="Generate metricgroups data", action='store_true')
>>>>> @@ -77,10 +78,60 @@ def Smi() -> MetricGroup:
>>>>>      ])
>>>>>
>>>>>
>>>>> +def Tsx() -> Optional[MetricGroup]:
>>>>> +    if args.model not in [
>>>>> +        'alderlake',
>>>>> +        'cascadelakex',
>>>>> +        'icelake',
>>>>> +        'icelakex',
>>>>> +        'rocketlake',
>>>>> +        'sapphirerapids',
>>>>> +        'skylake',
>>>>> +        'skylakex',
>>>>> +        'tigerlake',> +    ]:
>>>>
>>>> Can we get ride of the model list? Otherwise, we have to keep updating
>>>> the list.
>>>
>>> Do we expect the list to update? :-)
>>
>> Yes, at least for the meteorlake and graniterapids. They should be the
>> same as alderlake and sapphirerapids. I'm not sure about the future
>> platforms.
>>
>> Maybe we can have a if args.model in list here to include all the
>> non-hybrid models which doesn't support TSX. I think the list should not
>> be changed shortly.
>>
>>> The issue is the events are in
>>> sysfs and not the json. If we added the tsx events to json then this
>>> list wouldn't be necessary, but it also would mean the events would be
>>> present in "perf list" even when TSX is disabled.
>>
>> I think there may an alternative way, to check the RTM events, e.g.,
>> RTM_RETIRED.START event. We only need to generate the metrics for the
>> platform which supports the RTM_RETIRED.START event.
>>
>>
>>>
>>>>> +        return None
>>>>> +> +    pmu = "cpu_core" if args.model == "alderlake" else "cpu"
>>>>
>>>> Is it possible to change the check to the existence of the "cpu" PMU
>>>> here? has_pmu("cpu") ? "cpu" : "cpu_core"
>>>
>>> The "Unit" on "cpu" events in json always just blank. On hybrid it is
>>> either "cpu_core" or "cpu_atom", so I can make this something like:
>>>
>>> pmu = "cpu_core" if metrics.HasPmu("cpu_core") else "cpu"
>>>
>>> which would be a build time test.
>>
>> Yes, I think using the "Unit" is good enough.
>>
>>>
>>>
>>>>> +    cycles = Event('cycles')
>>>>> +    cycles_in_tx = Event(f'{pmu}/cycles\-t/')
>>>>> +    transaction_start = Event(f'{pmu}/tx\-start/')
>>>>> +    cycles_in_tx_cp = Event(f'{pmu}/cycles\-ct/')
>>>>> +    metrics = [
>>>>> +        Metric('tsx_transactional_cycles',
>>>>> +                      'Percentage of cycles within a transaction region.',
>>>>> +                      Select(cycles_in_tx / cycles, has_event(cycles_in_tx), 0),
>>>>> +                      '100%'),
>>>>> +        Metric('tsx_aborted_cycles', 'Percentage of cycles in aborted transactions.',
>>>>> +                      Select(max(cycles_in_tx - cycles_in_tx_cp, 0) / cycles,
>>>>> +                                    has_event(cycles_in_tx),
>>>>> +                                    0),
>>>>> +                      '100%'),
>>>>> +        Metric('tsx_cycles_per_transaction',
>>>>> +                      'Number of cycles within a transaction divided by the number of transactions.',
>>>>> +                      Select(cycles_in_tx / transaction_start,
>>>>> +                                    has_event(cycles_in_tx),
>>>>> +                                    0),
>>>>> +                      "cycles / transaction"),
>>>>> +    ]
>>>>> +    if args.model != 'sapphirerapids':
>>>>
>>>> Add the "tsx_cycles_per_elision" metric only if
>>>> has_event(f'{pmu}/el\-start/')?
>>>
>>> It's a sysfs event, so this wouldn't work :-(
>>
>> The below is the definition of el-start in the kernel.
>> EVENT_ATTR_STR(el-start,        el_start,       "event=0xc8,umask=0x1");
>>
>> The corresponding event in the event list should be HLE_RETIRED.START
>>       "EventCode": "0xC8",
>>       "UMask": "0x01",
>>       "EventName": "HLE_RETIRED.START",
>>
>> I think we may check the HLE_RETIRED.START instead. If the
>> HLE_RETIRED.START doesn't exist, I don't see a reason why the
>> tsx_cycles_per_elision should be supported.
>>
>> Again, in the virtualization world, it's possible that the
>> HLE_RETIRED.START exists in the event list but el_start isn't available
>> in the sysfs. I think it has to be specially handle in the test as well.
> 
> So we keep the has_event test on the sysfs event to handle the
> virtualization and disabled case. We use  HLE_RETIRED.START to detect
> whether the model supports TSX.

Yes. I think the JSON event always keeps the latest status of an event.
If an event is deprecated someday, I don't think there is a reason to
keep any metrics including the event. So we should use it to check
whether to generate a metrics.

The sysfs event tells if the current kernel support the event. It should
be used to check whether a metrics should be used/enabled.

> Should the event be the sysfs or json
> version? i.e.
> 
>         "MetricExpr": "(cycles\\-t / el\\-start if
> has_event(el\\-start) else 0)",
> 
> or
> 
>         "MetricExpr": "(cycles\\-t / HLE_RETIRED.START if
> has_event(el\\-start) else 0)",
> 
> I think I favor the former for some consistency with the has_event.
>

Agree, the former looks good to me too.


> Using HLE_RETIRED.START means the set of TSX models goes from:
>         'alderlake',
>         'cascadelakex',
>         'icelake',
>         'icelakex',
>         'rocketlake',
>         'sapphirerapids',
>         'skylake',
>         'skylakex',
>         'tigerlake',
> 
> To:
>    broadwell
>    broadwellde
>    broadwellx
>    cascadelakex
>    haswell
>    haswellx
>    icelake
>    rocketlake
>    skylake
>    skylakex
> 
> Using RTM_RETIRED.START it goes to:
>    broadwell
>    broadwellde
>    broadwellx
>    cascadelakex
>    emeraldrapids
>    graniterapids
>    haswell
>    haswellx
>    icelake
>    icelakex
>    rocketlake
>    sapphirerapids
>    skylake
>    skylakex
>    tigerlake
> 
> So I'm not sure it is working equivalently to what we have today,
> which may be good or bad. Here is what I think the code should look
> like:

Yes, there should be some changes. But I think the changes should be good.

For icelakex, the HLE_RETIRED.START has been deprecated. I don't see a
reason why should perf keep the tsx_cycles_per_elision metric.

For alderlake, TSX is deprecated. The perf should drop the related
metrics as well.
https://edc.intel.com/content/www/us/en/design/ipla/software-development-platforms/client/platforms/alder-lake-desktop/12th-generation-intel-core-processors-datasheet-volume-1-of-2/001/deprecated-technologies/

> 
> def Tsx() -> Optional[MetricGroup]:
>   pmu = "cpu_core" if CheckPmu("cpu_core") else "cpu"
>   cycles = Event('cycles')
>   cycles_in_tx = Event(f'{pmu}/cycles\-t/')
>   cycles_in_tx_cp = Event(f'{pmu}/cycles\-ct/')
>   try:
>     # Test if the tsx event is present in the json, prefer the
>     # sysfs version so that we can detect its presence at runtime.
>     transaction_start = Event("RTM_RETIRED.START")
>     transaction_start = Event(f'{pmu}/tx\-start/')
>   except:
>     return None
> 
>   elision_start = None
>   try:
>     # Elision start isn't supported by all models, but we'll not
>     # generate the tsx_cycles_per_elision metric in that
>     # case. Again, prefer the sysfs encoding of the event.
>     elision_start = Event("HLE_RETIRED.START")
>     elision_start = Event(f'{pmu}/el\-start/')
>   except:
>     pass
> 
>   return MetricGroup('transaction', [
>       Metric('tsx_transactional_cycles',
>              'Percentage of cycles within a transaction region.',
>              Select(cycles_in_tx / cycles, has_event(cycles_in_tx), 0),
>              '100%'),
>       Metric('tsx_aborted_cycles', 'Percentage of cycles in aborted
> transactions.',
>              Select(max(cycles_in_tx - cycles_in_tx_cp, 0) / cycles,
>                     has_event(cycles_in_tx),
>                     0),
>              '100%'),
>       Metric('tsx_cycles_per_transaction',
>              'Number of cycles within a transaction divided by the
> number of transactions.',
>              Select(cycles_in_tx / transaction_start,
>                     has_event(cycles_in_tx),
>                     0),
>              "cycles / transaction"),
>       Metric('tsx_cycles_per_elision',
>              'Number of cycles within a transaction divided by the
> number of elisions.',
>              Select(cycles_in_tx / elision_start,
>                     has_event(elision_start),
>                     0),
>              "cycles / elision") if elision_start else None,
>   ], description="Breakdown of transactional memory statistics")
> 
> Wdyt?

Looks good to me.

Thanks,
Kan
> 
> Thanks,
> Ian
> 
>> Thanks,
>> Kan
>>
>>>
>>> Thanks,
>>> Ian
>>>
>>>> Thanks,
>>>> Kan
>>>>
>>>>> +        elision_start = Event(f'{pmu}/el\-start/')
>>>>> +        metrics += [
>>>>> +            Metric('tsx_cycles_per_elision',
>>>>> +                          'Number of cycles within a transaction divided by the number of elisions.',
>>>>> +                          Select(cycles_in_tx / elision_start,
>>>>> +                                        has_event(elision_start),
>>>>> +                                        0),
>>>>> +                          "cycles / elision"),
>>>>> +        ]
>>>>> +    return MetricGroup('transaction', metrics)
>>>>> +
>>>>> +
>>>>>  all_metrics = MetricGroup("", [
>>>>>      Idle(),
>>>>>      Rapl(),
>>>>>      Smi(),
>>>>> +    Tsx(),
>>>>>  ])
>>>>>
>>>>>  if args.metricgroups:
>>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ