linux-kernel - Re: [PATCH v2 0/3] perf-stat: share hardware PMCs with BPF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3E65B60E-B120-4E1A-BAF2-2FAEF136A4CD@fb.com>
Date:   Fri, 19 Mar 2021 00:22:07 +0000
From:   Song Liu <songliubraving@...com>
To:     Arnaldo <arnaldo.melo@...il.com>
CC:     Jiri Olsa <jolsa@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Kernel Team <Kernel-team@...com>,
        "Arnaldo Carvalho de Melo" <acme@...hat.com>,
        Jiri Olsa <jolsa@...nel.org>
Subject: Re: [PATCH v2 0/3] perf-stat: share hardware PMCs with BPF



> On Mar 18, 2021, at 5:09 PM, Arnaldo <arnaldo.melo@...il.com> wrote:
> 
> 
> 
> On March 18, 2021 6:14:34 PM GMT-03:00, Jiri Olsa <jolsa@...hat.com> wrote:
>> On Thu, Mar 18, 2021 at 03:52:51AM +0000, Song Liu wrote:
>>> 
>>> 
>>>> On Mar 17, 2021, at 6:11 AM, Arnaldo Carvalho de Melo
>> <acme@...nel.org> wrote:
>>>> 
>>>> Em Wed, Mar 17, 2021 at 02:29:28PM +0900, Namhyung Kim escreveu:
>>>>> Hi Song,
>>>>> 
>>>>> On Wed, Mar 17, 2021 at 6:18 AM Song Liu <songliubraving@...com>
>> wrote:
>>>>>> 
>>>>>> perf uses performance monitoring counters (PMCs) to monitor
>> system
>>>>>> performance. The PMCs are limited hardware resources. For
>> example,
>>>>>> Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
>>>>>> 
>>>>>> Modern data center systems use these PMCs in many different ways:
>>>>>> system level monitoring, (maybe nested) container level
>> monitoring, per
>>>>>> process monitoring, profiling (in sample mode), etc. In some
>> cases,
>>>>>> there are more active perf_events than available hardware PMCs.
>> To allow
>>>>>> all perf_events to have a chance to run, it is necessary to do
>> expensive
>>>>>> time multiplexing of events.
>>>>>> 
>>>>>> On the other hand, many monitoring tools count the common metrics
>> (cycles,
>>>>>> instructions). It is a waste to have multiple tools create
>> multiple
>>>>>> perf_events of "cycles" and occupy multiple PMCs.
>>>>> 
>>>>> Right, it'd be really helpful when the PMCs are frequently or
>> mostly shared.
>>>>> But it'd also increase the overhead for uncontended cases as BPF
>> programs
>>>>> need to run on every context switch.  Depending on the workload,
>> it may
>>>>> cause a non-negligible performance impact.  So users should be
>> aware of it.
>>>> 
>>>> Would be interesting to, humm, measure both cases to have a firm
>> number
>>>> of the impact, how many instructions are added when sharing using
>>>> --bpf-counters?
>>>> 
>>>> I.e. compare the "expensive time multiplexing of events" with its
>>>> avoidance by using --bpf-counters.
>>>> 
>>>> Song, have you perfmormed such measurements?
>>> 
>>> I have got some measurements with perf-bench-sched-messaging:
>>> 
>>> The system: x86_64 with 23 cores (46 HT)
>>> 
>>> The perf-stat command:
>>> perf stat -e
>> cycles,cycles,instructions,instructions,ref-cycles,ref-cycles <target,
>> etc.>
>>> 
>>> The benchmark command and output:
>>> ./perf bench sched messaging -g 40 -l 50000 -t
>>> # Running 'sched/messaging' benchmark:
>>> # 20 sender and receiver threads per group
>>> # 40 groups == 1600 threads run
>>>     Total time: 10X.XXX [sec]
>>> 
>>> 
>>> I use the "Total time" as measurement, so smaller number is better. 
>>> 
>>> For each condition, I run the command 5 times, and took the median of
>> 
>>> "Total time". 
>>> 
>>> Baseline (no perf-stat)			104.873 [sec]
>>> # global
>>> perf stat -a				107.887 [sec]
>>> perf stat -a --bpf-counters		106.071 [sec]
>>> # per task
>>> perf stat 				106.314 [sec]
>>> perf stat --bpf-counters 		105.965 [sec]
>>> # per cpu
>>> perf stat -C 1,3,5 			107.063 [sec]
>>> perf stat -C 1,3,5 --bpf-counters 	106.406 [sec]
>> 
>> I can't see why it's actualy faster than normal perf ;-)
>> would be worth to find out
> 
> Isn't this all about contended cases?

Yeah, the normal perf is doing time multiplexing; while --bpf-counters 
doesn't need it. 

Thanks,
Song