linux-kernel - Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bdf15a29-3d19-dff3-ad2c-506e19aeaa8a@huawei.com>
Date:   Thu, 24 Sep 2020 22:14:17 +0800
From:   "liwei (GF)" <liwei391@...wei.com>
To:     Andi Kleen <ak@...ux.intel.com>
CC:     Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        "Namhyung Kim" <namhyung@...nel.org>,
        Alexey Budankov <alexey.budankov@...ux.intel.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, <linux-kernel@...r.kernel.org>,
        <linux-arm-kernel@...ts.infradead.org>, <huawei.libin@...wei.com>
Subject: Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu
 events

Hi Andi,

On 2020/9/23 3:50, Andi Kleen wrote:
> On Tue, Sep 22, 2020 at 12:23:21PM -0700, Andi Kleen wrote:
>>> After debugging, i found the root reason is that the xyarray fd is created
>>> by evsel__open_per_thread() ignoring the cpu passed in
>>> create_perf_stat_counter(), while the evsel' cpumap is assigned as the
>>> corresponding PMU's cpumap in __add_event(). Thus, the xyarray fd is created
>>> with ncpus of dummy cpumap and an out of bounds 'cpu' index will be used in
>>> perf_evsel__close_fd_cpu().
>>>
>>> To address this, add a flag to mark this situation and avoid using the
>>> affinity technique when closing/enabling/disabling events.
>>
>> The flag seems like a hack. How about figuring out the correct number of 
>> CPUs and using that?
> 
> Also would like to understand what's different on ARM64 than other architectures.
> Or could this happen on x86 too?
> 

The problem is that when the user requests per-task events, the cpumask is expected
as NULL(dummy), while the armv8_pmu do has a cpumask which inherited by evsel.
The armv8_pmu's cpumask was added for heterogeneous systems. So this issue can not
happen on x86.

In fact, the cpumask is correct indeed, but it should't be used when we requesting
per-task events. As these events should be install on all cores, i doubt how much we
can benefit from the affinity technique, so i choosed to add a flag.

I also did a test on hisilicon arm64 d06 board, with 2 sockets 128 cores.
Testing the following command 3 times, with/without the affinity technique:

time tools/perf/perf stat -ddd -C 0-127 --per-core --timeout=5000 2> /dev/null

* (HEAD detached at 7074674e7338) perf cpumap: Maintain cpumaps ordered and without dups
real	0m8.039s
user	0m0.402s
sys	0m2.582s

real	0m7.939s
user	0m0.360s
sys	0m2.560s

real	0m7.997s
user	0m0.358s
sys	0m2.586s

* (HEAD detached at 704e2f5b700d) perf stat: Use affinity for enabling/disabling events
real	0m7.954s
user	0m0.308s
sys	0m2.590s

real	0m12.959s
user	0m0.332s
sys	0m2.582s

real	0m18.009s
user	0m0.346s
sys	0m2.562s

The offcpu time is much longer when using affinity, i think that's what migration costs,
could you please share me your test case?

Thanks,
Wei