linux-kernel - Re: [PATCH] perf/x86/intel: Fix n

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fc89da87-6070-beda-2437-4089257c55aa@amd.com>
Date:   Fri, 2 Oct 2020 16:10:42 -0500
From:   Kim Phillips <kim.phillips@....com>
To:     "Liang, Kan" <kan.liang@...ux.intel.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...hat.com, linux-kernel@...r.kernel.org, ak@...ux.intel.com
Subject: Re: [PATCH] perf/x86/intel: Fix n_metric for the canceled group

On 10/2/20 8:16 AM, Liang, Kan wrote:
> On 10/2/2020 7:02 AM, Peter Zijlstra wrote:
>> On Wed, Sep 30, 2020 at 07:29:35AM -0700, kan.liang@...ux.intel.com wrote:
>>> From: Kan Liang <kan.liang@...ux.intel.com>
>>>
>>> When a group that has TopDown members is failed to be scheduled, any
>>> later TopDown groups will not return valid values.
>>>
>>> Here is an example.
>>>
>>> A background perf that occupies all the GP counters and the fixed
>>> counter 1.
>>>   $perf stat -e "{cycles,cycles,cycles,cycles,cycles,cycles,cycles,
>>>                   cycles,cycles}:D" -a
>>>
>>> A user monitors a TopDown group. It works well, because the fixed
>>> counter 3 and the PERF_METRICS are available.
>>>   $perf stat -x, --topdown -- ./workload
>>>     retiring,bad speculation,frontend bound,backend bound,
>>>     18.0,16.1,40.4,25.5,
>>>
>>> Then the user tries to monitor a group that has TopDown members.
>>> Because of the cycles event, the group is failed to be scheduled.
>>>   $perf stat -x, -e '{slots,topdown-retiring,topdown-be-bound,
>>>                       topdown-fe-bound,topdown-bad-spec,cycles}'
>>>                       -- ./workload
>>>      <not counted>,,slots,0,0.00,,
>>>      <not counted>,,topdown-retiring,0,0.00,,
>>>      <not counted>,,topdown-be-bound,0,0.00,,
>>>      <not counted>,,topdown-fe-bound,0,0.00,,
>>>      <not counted>,,topdown-bad-spec,0,0.00,,
>>>      <not counted>,,cycles,0,0.00,,
>>>
>>> The user tries to monitor a TopDown group again. It doesn't work anymore.
>>>   $perf stat -x, --topdown -- ./workload
>>>
>>>      ,,,,,
>>>
>>> In a txn, cancel_txn() is to truncate the event_list for a canceled
>>> group and update the number of events added in this transaction.
>>> However, the number of TopDown events added in this transaction is not
>>> updated. The kernel will probably fail to add new Topdown events.
>>>
>>> Check if the canceled group has Topdown events. If so, subtract the
>>> TopDown events from n_metric accordingly.
>>>
>>> Fixes: 7b2c05a15d29 ("perf/x86/intel: Generic support for hardware TopDown metrics")
>>> Reported-by: Andi Kleen <ak@...ux.intel.com>
>>> Reviewed-by: Andi Kleen <ak@...ux.intel.com>
>>> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
>>> ---

>>
>> Urgh, I'd much rather we add n_txn_metric. But also, while looking at
>> this, don't we have the same problem with n_pair ?
>>
>> Something like this perhaps...
>>
> 
> Sure. For the perf metric, the below patch fixes the problem.
> 
> Tested-by: Kan Liang <kan.liang@...ux.intel.com>

Tested-by: Kim Phillips <kim.phillips@....com>

Excerpt from test script:

sudo perf stat -e "{cycles,cycles,cycles,cycles}:D" -a sleep 10 &

# should succeed:
sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload

# should fail:
sudo perf stat -e "{fp_ret_sse_avx_ops.all,fp_ret_sse_avx_ops.all,cycles}:D" -a workload

# previously failed, now succeeds with this patch:
sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload

Thanks both,

Kim