[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e937591b-542d-c3ac-bc64-d5223c27f70d@linux.ibm.com>
Date: Wed, 8 Dec 2021 12:03:01 +0530
From: kajoljain <kjain@...ux.ibm.com>
To: Kim Phillips <kim.phillips@....com>, Jiri Olsa <jolsa@...hat.com>
Cc: Arnaldo Carvalho de Melo <acme@...hat.com>,
linux-kernel@...r.kernel.org, linux-perf-users@...r.kernel.org,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Boris Ostrovsky <boris.ostrovsky@...cle.com>,
Ian Rogers <irogers@...gle.com>,
Ingo Molnar <mingo@...hat.com>,
Joao Martins <joao.m.martins@...cle.com>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Mark Rutland <mark.rutland@....com>,
Michael Petlan <mpetlan@...hat.com>,
Namhyung Kim <namhyung@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Robert Richter <robert.richter@....com>,
Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH 2/2] perf tools: Improve IBS error handling
On 11/30/21 3:39 AM, Kim Phillips wrote:
> On 11/24/21 2:00 AM, kajoljain wrote:
>> On 11/23/21 8:55 PM, Kim Phillips wrote:
>>> On 11/23/21 2:40 AM, kajoljain wrote:
>>>> On 10/8/21 12:47 AM, Kim Phillips wrote:
>>>>> On 10/7/21 12:28 PM, Jiri Olsa wrote:
>>>>>> On Mon, Oct 04, 2021 at 04:41:14PM -0500, Kim Phillips wrote:
>>>>>>> ---
>>>>>>> tools/perf/util/evsel.c | 24 ++++++++++++++++++++++++
>>>>>>> 1 file changed, 24 insertions(+)
>>>>>>>
>>>>>>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>>>>>>> index b915840690d4..f8a9cbd99314 100644
>>>>>>> --- a/tools/perf/util/evsel.c
>>>>>>> +++ b/tools/perf/util/evsel.c
>>>>>>> @@ -2743,9 +2743,22 @@ static bool find_process(const char *name)
>>>>>>> return ret ? false : true;
>>>>>>> }
>>>>>>> +static bool is_amd(const char *arch, const char *cpuid)
>>>>>>> +{
>>>>>>> + return arch && !strcmp("x86", arch) && cpuid &&
>>>>>>> strstarts(cpuid,
>>>>>>> "AuthenticAMD");
>>>>>>> +}
>>>>>>> +
>>>>>>> +static bool is_amd_ibs(struct evsel *evsel)
>>>>>>> +{
>>>>>>> + return evsel->core.attr.precise_ip || !strncmp(evsel->pmu_name,
>>>>>>> "ibs", 3);
>>>>>>> +}
>>>>>>> +
>>>>>>> int evsel__open_strerror(struct evsel *evsel, struct target
>>>>>>> *target,
>>>>>>> int err, char *msg, size_t size)
>>>>>>> {
>>>>>>> + struct perf_env *env = evsel__env(evsel);
>>>>>>> + const char *arch = perf_env__arch(env);
>>>>>>> + const char *cpuid = perf_env__cpuid(env);
>>>>>>> char sbuf[STRERR_BUFSIZE];
>>>>>>> int printed = 0, enforced = 0;
>>>>>>> @@ -2841,6 +2854,17 @@ int evsel__open_strerror(struct evsel
>>>>>>> *evsel, struct target *target,
>>>>>>> return scnprintf(msg, size, "wrong clockid (%d).",
>>>>>>> clockid);
>>>>>>> if (perf_missing_features.aux_output)
>>>>>>> return scnprintf(msg, size, "The 'aux_output'
>>>>>>> feature
>>>>>>> is not supported, update the kernel.");
>>>>>>> + if (is_amd(arch, cpuid)) {
>>>>>>> + if (is_amd_ibs(evsel)) {
>>>>>>
>>>>>> would single 'is_amd_ibs' call be better? checking on both amd and
>>>>>> ibs
>>>>>
>>>>> Good suggestion. If you look at the later patch in the
>>>>> BRS series, I have rewritten it to add the new
>>>>> AMD PMU like so:
>>>>>
>>>>> if (is_amd()) {
>>>>> if (is_amd_ibs()) {
>>>>> if (evsel->this)
>>>>> return
>>>>> if (evsel->that)
>>>>> return
>>>>> }
>>>>> + if (is_amd_brs()) {
>>>>> + if (evsel->this)
>>>>> + return
>>>>> + if (evsel->that)
>>>>> + return
>>>>> + }
>>>>> }
>>>>
>>>> Hi Kim,
>>>> From my point of view, it won't be a good idea of adding so many
>>>> checks in common function definition itself.
>>>> Can you just create a check to see if its amd machine and then add a
>>>> function call which will handle all four conditions together?
>>>>
>>>> which is basically for:
>>>>
>>>> + if (is_amd(arch, cpuid)) {
>>>> + if (is_amd_ibs(evsel)) {
>>>> + if (evsel->core.attr.exclude_kernel)
>>>> + return scnprintf(msg, size,
>>>> + "AMD IBS can't exclude kernel events. Try running at a higher
>>>> privilege level.");
>>>> + if (!evsel->core.system_wide)
>>>> + return scnprintf(msg, size,
>>>> + "AMD IBS may only be available in system-wide/per-cpu mode. Try
>>>> using
>>>> -a, or -C and workload affinity");
>>>> + }
>>>>
>>>> and this:
>>>>
>>>> + if (is_amd_brs(evsel)) {
>>>> + if (evsel->core.attr.freq)
>>>> + return scnprintf(msg, size,
>>>> + "AMD Branch Sampling does not support frequency mode sampling,
>>>> must
>>>> pass a fixed sampling period via -c option or
>>>> cpu/branch-brs,period=xxxx/.");
>>>> + /* another reason is that the period is too small */
>>>> + return scnprintf(msg, size,
>>>> + "AMD Branch Sampling does not support sampling period smaller than
>>>> what is reported in /sys/devices/cpu/caps/branches.");
>>>> + }
>>>
>>> IIRC, I tried something like that but carrying the
>>>
>>>
>>> struct target *target, int err, char *msg, size_t size
>>>
>>> parameters made things worse.
>>>
>>>> So, incase we are in amd machine, common function evsel__open_strerror
>>>> will call function may be something like amd_evesel_open_strerror_check
>>>> which will look for both ibs and brs conditions and return
>>>> corresponding
>>>> error statement.
>>>
>>> The vast majority of decisions made by evsel__open_strerror are
>>> going to be common across most arch/uarches. AMD has only these
>>> two pesky exceptions to the rule and therefore IMO it's ok
>>> to have them inline with the common function, since the decisions
>>> are so deeply intertwined. A new amd_evsel_open_strerror_check
>>> sounds like it'd duplicate too much of the common function code
>>> in order to handle the common error cases.
>>
>> Hi Kim,
>> Sorry for the confusion, what I meant by adding new function is just
>> to handle these corner error cases and not duplicating whole
>> evsel__open_strerror code.
>>
>> Maybe something like below code, Its just prototype of code to show you
>> the flow, you can refine it and check for any build or indentation
>> issues using checkpatch.pl script.
>>
>> So basically, in common function we can just have 2 calls, first to
>> check if we are in amd system and second to return corresponding error
>> message, rather then adding whole chunk of if's which are specific to
>> amd.
>>
>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> index ac0127be0459..adefb162ae08 100644
>> --- a/tools/perf/util/evsel.c
>> +++ b/tools/perf/util/evsel.c
>> @@ -2852,9 +2852,40 @@ static bool find_process(const char *name)
>> return ret ? false : true;
>> }
>>
>> +static bool is_amd(const char *arch, const char *cpuid)
>> +{
>> + return arch && !strcmp("x86", arch) && cpuid && strstarts(cpuid,
>> "AuthenticAMD");
>> +}
>> +
>> +static int error_amd_ibs_brs(struct evsel *evsel, char *msg, size_t
>> size)
>> +{
>> + if (evsel->core.attr.precise_ip || !strncmp(evsel->pmu_name,
>> "ibs", 3)) {
>> + if (evsel->core.attr.exclude_kernel)
>> + return scnprintf(msg, size,
>> + "AMD IBS can't exclude kernel events. Try running at a higher
>> privilege level.");
>> + if (!evsel->core.system_wide)
>> + return scnprintf(msg, size,
>> + "AMD IBS may only be available in system-wide/per-cpu mode. Try
>> using -a, or -C and workload affinity");
>> + }
>> +
>> + if (((evsel->core.attr.config & 0xff) == 0xc4) &&
>> (evsel->core.attr.sample_type & PERF_SAMPLE_BRANCH_STACK)) {
>> + if (evsel->core.attr.freq) {
>> + return scnprintf(msg, size,
>> + "AMD Branch Sampling does not support frequency mode sampling,
>> must pass a fixed sampling
>> + period via -c option or cpu/branch-brs,period=xxxx/.");
>> + /* another reason is that the period is too small */
>> + return scnprintf(msg, size,
>> + "AMD Branch Sampling does not support sampling period smaller
>> than what is reported in /sys/devices/cpu/caps/branches.");
>> + }
>> + }
>> +}
>> +
>> int evsel__open_strerror(struct evsel *evsel, struct target *target,
>> int err, char *msg, size_t size)
>> {
>> + struct perf_env *env = evsel__env(evsel);
>> + const char *arch = perf_env__arch(env);
>> + const char *cpuid = perf_env__cpuid(env);
>> char sbuf[STRERR_BUFSIZE];
>> int printed = 0, enforced = 0;
>>
>> @@ -2950,6 +2981,8 @@ int evsel__open_strerror(struct evsel *evsel,
>> struct target *target,
>> return scnprintf(msg, size, "wrong clockid
>> (%d).", clockid);
>> if (perf_missing_features.aux_output)
>> return scnprintf(msg, size, "The 'aux_output'
>> feature is not supported, update the kernel.");
>> + if (is_amd(arch, cpuid))
>> + return error_amd_ibs_brs(evsel, msg, size);
>> break;
>> case ENODATA:
>> return scnprintf(msg, size, "Cannot collect data source
>> with the load latency event alone. "
>
> That change will makes AMD machines fail to fall back to the default
> "The sys_perf_event_open() syscall returned with..." error string
> in case it's not those AMD IBS and BRS sub-conditions.
Yes right, as I mentioned before, the code I pointed was just a
prototype to show you the flow, these corner cases can be handled on top
of it.
>
> Is having the AMD error code checking in the main evsel__open_strerror()
> so bad? Other arches and their PMU implementations may find error
> conditions that they have in common with AMD's, therefore
> opening up the code for opposite types of refactoring and
> reuse than what is being requested here. E.g., I've seen
> other hardware configurations - not specific to one architecture -
> that could also use this message:
>
>From my understanding, adding too many checks in common function
for a specific arch is not a good practice. Since you already adding
multiple functions to get information like ,if current system is
amd/ibs/brs. Can't we rather just add a single function and handled all
these checks there?
That's just my thoughts, if maintainers are ok with it, then its fine
for me too.
Thanks,
Kajol Jain
> {"AMD IBS"->"%s",pmu_name} may only be available in system-wide/per-cpu
> mode. Try using -a, or -C and workload affinity");
>
> Thanks,
>
> Kim
Powered by blists - more mailing lists