[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bdf15a29-3d19-dff3-ad2c-506e19aeaa8a@huawei.com>
Date: Thu, 24 Sep 2020 22:14:17 +0800
From: "liwei (GF)" <liwei391@...wei.com>
To: Andi Kleen <ak@...ux.intel.com>
CC: Arnaldo Carvalho de Melo <acme@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...hat.com>,
"Namhyung Kim" <namhyung@...nel.org>,
Alexey Budankov <alexey.budankov@...ux.intel.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, <linux-kernel@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>, <huawei.libin@...wei.com>
Subject: Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu
events
Hi Andi,
On 2020/9/23 3:50, Andi Kleen wrote:
> On Tue, Sep 22, 2020 at 12:23:21PM -0700, Andi Kleen wrote:
>>> After debugging, i found the root reason is that the xyarray fd is created
>>> by evsel__open_per_thread() ignoring the cpu passed in
>>> create_perf_stat_counter(), while the evsel' cpumap is assigned as the
>>> corresponding PMU's cpumap in __add_event(). Thus, the xyarray fd is created
>>> with ncpus of dummy cpumap and an out of bounds 'cpu' index will be used in
>>> perf_evsel__close_fd_cpu().
>>>
>>> To address this, add a flag to mark this situation and avoid using the
>>> affinity technique when closing/enabling/disabling events.
>>
>> The flag seems like a hack. How about figuring out the correct number of
>> CPUs and using that?
>
> Also would like to understand what's different on ARM64 than other architectures.
> Or could this happen on x86 too?
>
The problem is that when the user requests per-task events, the cpumask is expected
as NULL(dummy), while the armv8_pmu do has a cpumask which inherited by evsel.
The armv8_pmu's cpumask was added for heterogeneous systems. So this issue can not
happen on x86.
In fact, the cpumask is correct indeed, but it should't be used when we requesting
per-task events. As these events should be install on all cores, i doubt how much we
can benefit from the affinity technique, so i choosed to add a flag.
I also did a test on hisilicon arm64 d06 board, with 2 sockets 128 cores.
Testing the following command 3 times, with/without the affinity technique:
time tools/perf/perf stat -ddd -C 0-127 --per-core --timeout=5000 2> /dev/null
* (HEAD detached at 7074674e7338) perf cpumap: Maintain cpumaps ordered and without dups
real 0m8.039s
user 0m0.402s
sys 0m2.582s
real 0m7.939s
user 0m0.360s
sys 0m2.560s
real 0m7.997s
user 0m0.358s
sys 0m2.586s
* (HEAD detached at 704e2f5b700d) perf stat: Use affinity for enabling/disabling events
real 0m7.954s
user 0m0.308s
sys 0m2.590s
real 0m12.959s
user 0m0.332s
sys 0m2.582s
real 0m18.009s
user 0m0.346s
sys 0m2.562s
The offcpu time is much longer when using affinity, i think that's what migration costs,
could you please share me your test case?
Thanks,
Wei
Powered by blists - more mailing lists