linux-kernel - Re: [PATCH v1] perf stat: avoid 10ms limit for printing event counts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180327115937.GR13724@tassilo.jf.intel.com>
Date:   Tue, 27 Mar 2018 04:59:37 -0700
From:   Andi Kleen <ak@...ux.intel.com>
To:     Alexey Budankov <alexey.budankov@...ux.intel.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1] perf stat: avoid 10ms limit for printing event counts

On Tue, Mar 27, 2018 at 02:40:29PM +0300, Alexey Budankov wrote:
> On 27.03.2018 12:06, Andi Kleen wrote:
> >> When running perf stat -I for monitoring e.g. PCIe uncore counters and 
> >> at the same time profiling some I/O workload by perf record e.g. for 
> >> cpu-cycles and context switches, it is then possible to build and 
> >> observe good-enough consolidated CPU/OS/IO(Uncore) performance picture 
> >> for that workload.
> > 
> > At some point I still hope we can make uncore measurements in 
> > perf record work. Kan tried at some point to allow multiple
> > PMUs in a group, but was not successfull. But perhaps we
> > can sample them from a software event instead.
> > 
> >>
> >> The warning on possible runtime overhead is still preserved, however 
> >> it is only visible when specifying -v option.
> > 
> > I would print it unconditionally. Very few people use -v.
> > 
> > BTW better of course would be to occasionally measure the perf stat 
> > cpu time and only print the warning if it's above some percentage
> > of a CPU. But that would be much more work.
> 
> Would you please elaborate more on that?

getrusage() can give you the system+user time of the current process.
If you compare that to wall time you know the percentage.

Could measure those occasionally (not every interval, but perhaps
once per second or so). If the overhead reaches a reasonable percentage (5%
perhaps?) print the warning once.

One problem is th the measurement doesn't inlude time in the remote
IPIs for reading performance counters on other CPUs.  So if the system
is very large it may be less and less accurate. But maybe it's a good
enough proxy.

Or in theory could fix the kernel to charge this somehow to the process
that triggered the IPIs, but that would be another project.

Another problem is that it doesn't account for burstiness. Maybe
the problem is not the smoothed average of CPU time, but bursts
competing with the original workload. There's probably no easy
solution for that.

Also if the CPU perf stat runs on is idle it of course doesn't matter.
Getting that would require reading /proc, which would be much more 
expensive so probably not a good idea. As a proxy you could check
the involuntary context switches (also reported by getrusage),
and if they don't cross some threshold then don't warn)

-Andi