lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e937591b-542d-c3ac-bc64-d5223c27f70d@linux.ibm.com>
Date:   Wed, 8 Dec 2021 12:03:01 +0530
From:   kajoljain <kjain@...ux.ibm.com>
To:     Kim Phillips <kim.phillips@....com>, Jiri Olsa <jolsa@...hat.com>
Cc:     Arnaldo Carvalho de Melo <acme@...hat.com>,
        linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Ian Rogers <irogers@...gle.com>,
        Ingo Molnar <mingo@...hat.com>,
        Joao Martins <joao.m.martins@...cle.com>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        Mark Rutland <mark.rutland@....com>,
        Michael Petlan <mpetlan@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Robert Richter <robert.richter@....com>,
        Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH 2/2] perf tools: Improve IBS error handling



On 11/30/21 3:39 AM, Kim Phillips wrote:
> On 11/24/21 2:00 AM, kajoljain wrote:
>> On 11/23/21 8:55 PM, Kim Phillips wrote:
>>> On 11/23/21 2:40 AM, kajoljain wrote:
>>>> On 10/8/21 12:47 AM, Kim Phillips wrote:
>>>>> On 10/7/21 12:28 PM, Jiri Olsa wrote:
>>>>>> On Mon, Oct 04, 2021 at 04:41:14PM -0500, Kim Phillips wrote:
>>>>>>> ---
>>>>>>>     tools/perf/util/evsel.c | 24 ++++++++++++++++++++++++
>>>>>>>     1 file changed, 24 insertions(+)
>>>>>>>
>>>>>>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>>>>>>> index b915840690d4..f8a9cbd99314 100644
>>>>>>> --- a/tools/perf/util/evsel.c
>>>>>>> +++ b/tools/perf/util/evsel.c
>>>>>>> @@ -2743,9 +2743,22 @@ static bool find_process(const char *name)
>>>>>>>         return ret ? false : true;
>>>>>>>     }
>>>>>>>     +static bool is_amd(const char *arch, const char *cpuid)
>>>>>>> +{
>>>>>>> +    return arch && !strcmp("x86", arch) && cpuid &&
>>>>>>> strstarts(cpuid,
>>>>>>> "AuthenticAMD");
>>>>>>> +}
>>>>>>> +
>>>>>>> +static bool is_amd_ibs(struct evsel *evsel)
>>>>>>> +{
>>>>>>> +    return evsel->core.attr.precise_ip || !strncmp(evsel->pmu_name,
>>>>>>> "ibs", 3);
>>>>>>> +}
>>>>>>> +
>>>>>>>     int evsel__open_strerror(struct evsel *evsel, struct target
>>>>>>> *target,
>>>>>>>                  int err, char *msg, size_t size)
>>>>>>>     {
>>>>>>> +    struct perf_env *env = evsel__env(evsel);
>>>>>>> +    const char *arch = perf_env__arch(env);
>>>>>>> +    const char *cpuid = perf_env__cpuid(env);
>>>>>>>         char sbuf[STRERR_BUFSIZE];
>>>>>>>         int printed = 0, enforced = 0;
>>>>>>>     @@ -2841,6 +2854,17 @@ int evsel__open_strerror(struct evsel
>>>>>>> *evsel, struct target *target,
>>>>>>>                 return scnprintf(msg, size, "wrong clockid (%d).",
>>>>>>> clockid);
>>>>>>>             if (perf_missing_features.aux_output)
>>>>>>>                 return scnprintf(msg, size, "The 'aux_output'
>>>>>>> feature
>>>>>>> is not supported, update the kernel.");
>>>>>>> +        if (is_amd(arch, cpuid)) {
>>>>>>> +            if (is_amd_ibs(evsel)) {
>>>>>>
>>>>>> would single 'is_amd_ibs' call be better? checking on both amd and
>>>>>> ibs
>>>>>
>>>>> Good suggestion. If you look at the later patch in the
>>>>> BRS series, I have rewritten it to add the new
>>>>> AMD PMU like so:
>>>>>
>>>>>    if (is_amd()) {
>>>>>        if (is_amd_ibs()) {
>>>>>            if (evsel->this)
>>>>>                return
>>>>>            if (evsel->that)
>>>>>                return
>>>>>        }
>>>>> +    if (is_amd_brs()) {
>>>>> +        if (evsel->this)
>>>>> +            return
>>>>> +        if (evsel->that)
>>>>> +            return
>>>>> +    }
>>>>>    }
>>>>
>>>> Hi Kim,
>>>>        From my point of view, it won't be a good idea of adding so many
>>>> checks in common function definition itself.
>>>> Can you just create a check to see if its amd machine and then add a
>>>> function call which will handle all four conditions together?
>>>>
>>>> which is basically for:
>>>>
>>>> +        if (is_amd(arch, cpuid)) {
>>>> +            if (is_amd_ibs(evsel)) {
>>>> +                if (evsel->core.attr.exclude_kernel)
>>>> +                    return scnprintf(msg, size,
>>>> +    "AMD IBS can't exclude kernel events.  Try running at a higher
>>>> privilege level.");
>>>> +                if (!evsel->core.system_wide)
>>>> +                    return scnprintf(msg, size,
>>>> +    "AMD IBS may only be available in system-wide/per-cpu mode.  Try
>>>> using
>>>> -a, or -C and workload affinity");
>>>> +            }
>>>>
>>>> and this:
>>>>
>>>> +            if (is_amd_brs(evsel)) {
>>>> +                if (evsel->core.attr.freq)
>>>> +                    return scnprintf(msg, size,
>>>> +    "AMD Branch Sampling does not support frequency mode sampling,
>>>> must
>>>> pass a fixed sampling period via -c option or
>>>> cpu/branch-brs,period=xxxx/.");
>>>> +                /* another reason is that the period is too small */
>>>> +                return scnprintf(msg, size,
>>>> +    "AMD Branch Sampling does not support sampling period smaller than
>>>> what is reported in /sys/devices/cpu/caps/branches.");
>>>> +            }
>>>
>>> IIRC, I tried something like that but carrying the
>>>
>>>
>>> struct target *target, int err, char *msg, size_t size
>>>
>>> parameters made things worse.
>>>
>>>> So, incase we are in amd machine,  common function evsel__open_strerror
>>>> will call function may be something like amd_evesel_open_strerror_check
>>>> which will look for both ibs and brs conditions and return
>>>> corresponding
>>>> error statement.
>>>
>>> The vast majority of decisions made by evsel__open_strerror are
>>> going to be common across most arch/uarches.  AMD has only these
>>> two pesky exceptions to the rule and therefore IMO it's ok
>>> to have them inline with the common function, since the decisions
>>> are so deeply intertwined.  A new amd_evsel_open_strerror_check
>>> sounds like it'd duplicate too much of the common function code
>>> in order to handle the common error cases.
>>
>> Hi Kim,
>>     Sorry for the confusion, what I meant by adding new function is just
>> to handle these corner error cases and not duplicating whole
>> evsel__open_strerror code.
>>
>> Maybe something like below code, Its just prototype of code to show you
>> the flow, you can refine it and check for any build or indentation
>> issues using checkpatch.pl script.
>>
>> So basically, in common function we can just have 2 calls, first to
>> check if we are in amd system and second to return corresponding error
>> message, rather then adding whole chunk of if's which are specific to
>> amd.
>>
>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> index ac0127be0459..adefb162ae08 100644
>> --- a/tools/perf/util/evsel.c
>> +++ b/tools/perf/util/evsel.c
>> @@ -2852,9 +2852,40 @@ static bool find_process(const char *name)
>>          return ret ? false : true;
>>   }
>>
>> +static bool is_amd(const char *arch, const char *cpuid)
>> +{
>> +       return arch && !strcmp("x86", arch) && cpuid && strstarts(cpuid,
>> "AuthenticAMD");
>> +}
>> +
>> +static int error_amd_ibs_brs(struct evsel *evsel, char *msg, size_t
>> size)
>> +{
>> +       if (evsel->core.attr.precise_ip || !strncmp(evsel->pmu_name,
>> "ibs", 3)) {
>> +               if (evsel->core.attr.exclude_kernel)
>> +                       return scnprintf(msg, size,
>> +       "AMD IBS can't exclude kernel events.  Try running at a higher
>> privilege level.");
>> +               if (!evsel->core.system_wide)
>> +                       return scnprintf(msg, size,
>> +       "AMD IBS may only be available in system-wide/per-cpu mode.  Try
>> using -a, or -C and workload affinity");
>> +       }
>> +
>> +       if (((evsel->core.attr.config & 0xff) == 0xc4) &&
>> (evsel->core.attr.sample_type & PERF_SAMPLE_BRANCH_STACK)) {
>> +               if (evsel->core.attr.freq) {
>> +                       return scnprintf(msg, size,
>> +       "AMD Branch Sampling does not support frequency mode sampling,
>> must pass a fixed sampling
>> +          period via -c option or cpu/branch-brs,period=xxxx/.");
>> +                /* another reason is that the period is too small */
>> +               return scnprintf(msg, size,
>> +       "AMD Branch Sampling does not support sampling period smaller
>> than what is reported in /sys/devices/cpu/caps/branches.");
>> +               }
>> +       }
>> +}
>> +
>>   int evsel__open_strerror(struct evsel *evsel, struct target *target,
>>                           int err, char *msg, size_t size)
>>   {
>> +       struct perf_env *env = evsel__env(evsel);
>> +       const char *arch = perf_env__arch(env);
>> +       const char *cpuid = perf_env__cpuid(env);
>>          char sbuf[STRERR_BUFSIZE];
>>          int printed = 0, enforced = 0;
>>
>> @@ -2950,6 +2981,8 @@ int evsel__open_strerror(struct evsel *evsel,
>> struct target *target,
>>                          return scnprintf(msg, size, "wrong clockid
>> (%d).", clockid);
>>                  if (perf_missing_features.aux_output)
>>                          return scnprintf(msg, size, "The 'aux_output'
>> feature is not supported, update the kernel.");
>> +               if (is_amd(arch, cpuid))
>> +                       return error_amd_ibs_brs(evsel, msg, size);
>>                  break;
>>          case ENODATA:
>>                  return scnprintf(msg, size, "Cannot collect data source
>> with the load latency event alone. "
> 
> That change will makes AMD machines fail to fall back to the default
> "The sys_perf_event_open() syscall returned with..." error string
> in case it's not those AMD IBS and BRS sub-conditions.

Yes right, as I mentioned before, the code I pointed was just a
prototype to show you the flow, these corner cases can be handled on top
of it.

> 
> Is having the AMD error code checking in the main evsel__open_strerror()
> so bad?  Other arches and their PMU implementations may find error
> conditions that they have in common with AMD's, therefore
> opening up the code for opposite types of refactoring and
> reuse than what is being requested here.  E.g., I've seen
> other hardware configurations - not specific to one architecture -
> that could also use this message:
> 

>From my understanding, adding too many checks in common function
for a specific arch is not a good practice. Since you already adding
multiple functions to get information like ,if current system is
amd/ibs/brs. Can't we rather just add a single function and handled all
these checks there?
That's just my thoughts, if maintainers are ok with it, then its fine
for me too.

Thanks,
Kajol Jain

> {"AMD IBS"->"%s",pmu_name} may only be available in system-wide/per-cpu
> mode.  Try using -a, or -C and workload affinity");
> 
> Thanks,
> 
> Kim

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ