linux-kernel - Re: [PATCH v1] perf stat: avoid 10ms limit for printing event counts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <fb969378-3cd1-77dc-bdaf-9b9c972fb084@linux.intel.com>
Date:   Tue, 27 Mar 2018 19:27:09 +0300
From:   Alexey Budankov <alexey.budankov@...ux.intel.com>
To:     Andi Kleen <ak@...ux.intel.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1] perf stat: avoid 10ms limit for printing event counts

On 27.03.2018 14:59, Andi Kleen wrote:
> On Tue, Mar 27, 2018 at 02:40:29PM +0300, Alexey Budankov wrote:
>> On 27.03.2018 12:06, Andi Kleen wrote:
>>>> When running perf stat -I for monitoring e.g. PCIe uncore counters and 
>>>> at the same time profiling some I/O workload by perf record e.g. for 
>>>> cpu-cycles and context switches, it is then possible to build and 
>>>> observe good-enough consolidated CPU/OS/IO(Uncore) performance picture 
>>>> for that workload.
>>>
>>> At some point I still hope we can make uncore measurements in 
>>> perf record work. Kan tried at some point to allow multiple
>>> PMUs in a group, but was not successfull. But perhaps we
>>> can sample them from a software event instead.
>>>
>>>>
>>>> The warning on possible runtime overhead is still preserved, however 
>>>> it is only visible when specifying -v option.
>>>
>>> I would print it unconditionally. Very few people use -v.

Thought it thru more. Printing the warning doesn't make sense in case 
you have output to the console because you quickly get your screen 
scrolled down. If the interval is small you may even skip it at all 
regardless of -v option.

It turns out that the right place to say about possible overhead is
in the help message generated by perf stat -h.

Thanks,
Alexey

>>>
>>> BTW better of course would be to occasionally measure the perf stat 
>>> cpu time and only print the warning if it's above some percentage
>>> of a CPU. But that would be much more work.
>>
>> Would you please elaborate more on that?
> 
> getrusage() can give you the system+user time of the current process.
> If you compare that to wall time you know the percentage.
> 
> Could measure those occasionally (not every interval, but perhaps
> once per second or so). If the overhead reaches a reasonable percentage (5%
> perhaps?) print the warning once.
> 
> One problem is th the measurement doesn't inlude time in the remote
> IPIs for reading performance counters on other CPUs.  So if the system
> is very large it may be less and less accurate. But maybe it's a good
> enough proxy.
> 
> Or in theory could fix the kernel to charge this somehow to the process
> that triggered the IPIs, but that would be another project.
> 
> Another problem is that it doesn't account for burstiness. Maybe
> the problem is not the smoothed average of CPU time, but bursts
> competing with the original workload. There's probably no easy
> solution for that.
> 
> Also if the CPU perf stat runs on is idle it of course doesn't matter.
> Getting that would require reading /proc, which would be much more 
> expensive so probably not a good idea. As a proxy you could check
> the involuntary context switches (also reported by getrusage),
> and if they don't cross some threshold then don't warn)
> 
> -Andi
>