linux-kernel - Re: [RFC/PATCH] perf report: Support latency profiling in system-wide mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aBmvmmRKpeVd6aT3@google.com>
Date: Mon, 5 May 2025 23:43:38 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
	Ian Rogers <irogers@...gle.com>,
	Kan Liang <kan.liang@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
	linux-perf-users@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>
Subject: Re: [RFC/PATCH] perf report: Support latency profiling in
 system-wide mode

On Tue, May 06, 2025 at 07:55:25AM +0200, Dmitry Vyukov wrote:
> On Tue, 6 May 2025 at 07:30, Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > Hello,
> >
> > On Mon, May 05, 2025 at 10:08:17AM +0200, Dmitry Vyukov wrote:
> > > On Sat, 3 May 2025 at 02:36, Namhyung Kim <namhyung@...nel.org> wrote:
> > > >
> > > > When it profile a target process (and its children), it's
> > > > straight-forward to track parallelism using sched-switch info.  The
> > > > parallelism is kept in machine-level in this case.
> > > >
> > > > But when it profile multiple processes like in the system-wide mode,
> > > > it might not be clear how to apply the (machine-level) parallelism to
> > > > different tasks.  That's why it disabled the latency profiling for
> > > > system-wide mode.
> > > >
> > > > But it should be able to track parallelism in each process and it'd
> > > > useful to profile latency issues in multi-threaded programs.  So this
> > > > patch tries to enable it.
> > > >
> > > > However using sched-switch info can be a problem since it may emit a lot
> > > > more data and more chances for losing data when perf cannot keep up with
> > > > it.
> > > >
> > > > Instead, it can maintain the current process for each CPU when it sees
> > > > samples.
> > >
> > > Interesting.
> > >
> > > Few questions:
> > > 1. Do we always see a CPU sample when a CPU becomes idle? Otherwise we
> > > will think that the last thread runs on that CPU for arbitrary long,
> > > when it's actually not.
> >
> > No, it's not guaranteed to have a sample for idle tasks.  So right, it
> > can mis-calculate the parallelism for the last task.  If we can emit
> > sched-switches only when it goes to the idle task, it'd be accurate.
> 
> Then I think the profile can be significantly off if the system wasn't
> ~100% loaded, right?

Yep, it can be.

> 
> > > 2. If yes, can we also lose that "terminating" even when a CPU becomes
> > > idle? If yes, then it looks equivalent to missing a context switch
> > > event.
> >
> > I'm not sure what you are asking.  When it lose some records because the
> > buffer is full, it'll see the task of the last sample on each CPU.
> > Maybe we want to reset the current task after PERF_RECORD_LOST.
> 
> This probably does not matter much if the answer to question 1 is No.
> 
> But what I was is the following:
> 
> let's say we have samples:
> Sample 1 for Pid 42 on Cpu 10
> Sample 2 for idle task on Cpu 10
> ... no samples for some time on Cpu 10 ...
> 
> When we process sample 2, we decrement the counter for running tasks
> for Pid 42, right.
> Now if sample 2 is lost, then we don't do decrement and the accounting
> becomes off.
> In a sense this is equivalent to the problem of losing context switch event.

Right.  But I think it's hard to be correct once it loses something.

> 
> 
> > > 3. Does this mode kick in even for non system-wide profiles (collected
> > > w/o context switch events)? If yes, do we properly understand when a
> > > thread stops running for such profiles? How do we do that? There won't
> > > be samples for idle/other tasks.
> >
> > For non system-wide profiles, the problem is that it cannot know when
> > the current task is scheduled out so that it can decrease the count of
> > parallelism.  So this approach cannot work and sched-switch info is
> > required.
> 
> Where does the patch check that this mode is used only for system-wide profiles?
> Is it that PERF_SAMPLE_CPU present only for system-wide profiles?

Basically yes, but you can use --sample-cpu to add it.

In util/evsel.c::evsel__config():

	if (target__has_cpu(&opts->target) || opts->sample_cpu)
		evsel__set_sample_bit(evsel, CPU);

Thanks,
Namhyung