linux-kernel - Re: [PATCH] perf evsel: Get group fd from CPU0 for system wide event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <82da23c8-94dd-4096-a987-a17087e2642f@linux.intel.com>
Date:   Wed, 20 May 2020 13:36:40 +0800
From:   "Jin, Yao" <yao.jin@...ux.intel.com>
To:     Jiri Olsa <jolsa@...hat.com>
Cc:     acme@...nel.org, jolsa@...nel.org, peterz@...radead.org,
        mingo@...hat.com, alexander.shishkin@...ux.intel.com,
        Linux-kernel@...r.kernel.org, ak@...ux.intel.com,
        kan.liang@...el.com, yao.jin@...el.com
Subject: Re: [PATCH] perf evsel: Get group fd from CPU0 for system wide event

Hi Jiri,

On 5/18/2020 11:28 AM, Jin, Yao wrote:
> Hi Jiri,
> 
> On 5/15/2020 4:33 PM, Jiri Olsa wrote:
>> On Fri, May 15, 2020 at 02:04:57PM +0800, Jin, Yao wrote:
>>
>> SNIP
>>
>>> I think I get the root cause. That should be a serious bug in get_group_fd, access violation!
>>>
>>> For a group mixed with system-wide event and per-core event and the group
>>> leader is system-wide event, access violation will happen.
>>>
>>> perf_evsel__alloc_fd allocates one FD member for system-wide event (only FD(evsel, 0, 0) is valid).
>>>
>>> But for per core event, perf_evsel__alloc_fd allocates N FD members (N =
>>> ncpus). For example, for ncpus is 8, FD(evsel, 0, 0) to FD(evsel, 7, 0) are
>>> valid.
>>>
>>> get_group_fd(struct evsel *evsel, int cpu, int thread)
>>> {
>>>      struct evsel *leader = evsel->leader;
>>>
>>>      fd = FD(leader, cpu, thread);    /* access violation may happen here */
>>> }
>>>
>>> If leader is system-wide event, only the FD(leader, 0, 0) is valid.
>>>
>>> When get_group_fd accesses FD(leader, 1, 0), access violation happens.
>>>
>>> My fix is:
>>>
>>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>>> index 28683b0eb738..db05b8a1e1a8 100644
>>> --- a/tools/perf/util/evsel.c
>>> +++ b/tools/perf/util/evsel.c
>>> @@ -1440,6 +1440,9 @@ static int get_group_fd(struct evsel *evsel, int cpu, int thread)
>>>          if (evsel__is_group_leader(evsel))
>>>                  return -1;
>>>
>>> +       if (leader->core.system_wide && !evsel->core.system_wide)
>>> +               return -2;
>>
>> so this effectively stops grouping system_wide events with others,
>> and I think it's correct, how about events that differ in cpumask?
>>
> 
> My understanding for the events that differ in cpumaks is, if the leader's cpumask is not fully 
> matched with the evsel's cpumask then we stop the grouping. Is this understanding correct?
> 
> I have done some tests and get some conclusions:
> 
> 1. If the group is mixed with core and uncore events, the system_wide checking can distinguish them.
> 
> 2. If the group is mixed with core and uncore events and "-a" is specified, the system_wide for core 
> event is also false. So system_wide checking can distinguish them too
> 
> 3. In my test, the issue only occurs when we collect the metric which is mixed with uncore event and 
> core event, so maybe checking the system_wide is OK.
> 
>> should we perhaps ensure this before we call open? go throught all
>> groups and check they are on the same cpus?
>>
> 
> The issue doesn't happen at most of the time (only for the metric consisting of uncore event and 
> core event), so fallback to stop grouping if call open is failed looks reasonable.
> 
> Thanks
> Jin Yao
> 
>> thanks,
>> jirka
>>
>>
>>> +
>>>          /*
>>>           * Leader must be already processed/open,
>>>           * if not it's a bug.
>>> @@ -1665,6 +1668,11 @@ static int evsel__open_cpu(struct evsel *evsel, struct perf_cpu_map *cpus,
>>>                                  pid = perf_thread_map__pid(threads, thread);
>>>
>>>                          group_fd = get_group_fd(evsel, cpu, thread);
>>> +                       if (group_fd == -2) {
>>> +                               errno = EINVAL;
>>> +                               err = -EINVAL;
>>> +                               goto out_close;
>>> +                       }
>>>   retry_open:
>>>                          test_attr__ready();
>>>
>>> It enables the perf_evlist__reset_weak_group. And in the second_pass (in
>>> __run_perf_stat), the events will be opened successfully.
>>>
>>> I have tested OK for this fix on cascadelakex.
>>>
>>> Thanks
>>> Jin Yao
>>>
>>

Is this fix OK?

Another thing is, do you think if we need to rename "evsel->core.system_wide" to 
"evsel->core.has_cpumask".

The "system_wide" may misleading.

evsel->core.system_wide = pmu ? pmu->is_uncore : false;

"pmu->is_uncore" is true if PMU has a "cpumask". But it's not just uncore PMU which has cpumask. 
Some other PMUs, e.g. cstate_pkg, also have cpumask. So for this case, "has_cpumask" should be better.

But I'm not sure if the change is OK for other case, e.g. PT, which also uses 
"evsel->core.system_wide".

Thanks
Jin Yao