linux-kernel - Re: [RFC/PATCH] perf report: Support latency profiling in system-wide mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aDormwKnOYm_-Jgs@google.com>
Date: Fri, 30 May 2025 15:05:15 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
	Ian Rogers <irogers@...gle.com>,
	Kan Liang <kan.liang@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
	Adrian Hunter <adrian.hunter@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
	linux-perf-users@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>
Subject: Re: [RFC/PATCH] perf report: Support latency profiling in
 system-wide mode

On Fri, May 30, 2025 at 07:50:45AM +0200, Dmitry Vyukov wrote:
> On Wed, 28 May 2025 at 20:38, Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > Hello,
> >
> > On Tue, May 27, 2025 at 09:14:34AM +0200, Dmitry Vyukov wrote:
> > > On Wed, 21 May 2025 at 09:30, Dmitry Vyukov <dvyukov@...gle.com> wrote:
> > > >
> > > > > Maybe we can use this
> > > > > only for the frequency mode which means user didn't use -c option or
> > > > > similar in the event description.
> > >
> > >
> > > All-in-all I think the best option for now is using CPU IDs to track
> > > parallelism as you suggested, but be more precise with idle detection.
> > > 2 passes over the trace may be fine to detect idle points. I see the
> > > most time now spent in hist_entry__cmp, which accesses other entries
> > > and is like a part of O(N*logN) processing, so a simple O(N) pass
> > > shouldn't slow it down much.
> > > That's what I would try. But I would also try to assess the precision
> > > of this approach by comparing with results of using explicit switch
> > > events.
> >
> > It's not clear to me how you want to maintain the idle info in the 2
> > pass approach.  Please feel free to propose something based on this
> > work.
> 
> 
> What part of it is unclear?
> 
> Basically, in the first pass we only mark events as sched_out/in.
> When we don't see samples on a CPU for 2*period, we mark the previous
> sample on the CPU as sched_out:
> 
>   // Assuming the period is stable across time and CPUs.
>   for_each_cpu(cpu) {
>       if (current[cpu]->last_timestamp + 2*period < sample->timestamp) {
>           if (current[cpu]->thread != idle)
>               current[cpu]->last_sample->sched_out = true;
>       }
>   }
> 
>   leader = machine__findnew_thread(machine, sample->pid);
>   if (current[sample->cpu]->thread != leader) {
>     current[sample->cpu]->last_sample->sched_out = true;
>     sample->sched_in = true;
>   }
>   current[sample->cpu]->thread = leader;
>   current[sample->cpu]->last_sample = sample;
>   current[sample->cpu]->last_timestamp = sample->timestamp;

Oh, you wanted to save the info in the sample.  But I'm afraid it won't
work since it's stack allocated for one-time use in the
perf_session__deliver_event().

> 
> 
> On the second pass we use the precomputed sched_in/out to calculate parallelism:
> 
>   leader = machine__findnew_thread(machine, sample->pid);
>   if (sample->sched_in)
>     leader->parallelism++;
>   sample->parallelism = leader->parallelism;
>   if (sample->sched_out)
>     leader->parallelism--;
> 
> This is more precise b/c we don't consider a thread running for
> 2*period after it stopped running.

IIUC it can make some samples have less parallelism right before
they go to idle.

 
> A more precise approach would probably be to consider the thread
> running for 0.5*period after the last sample (and similarly for
> 0.5*period before the first sample), but it would require injecting
> sched_in/out events into the trace at these points.

Yep, that will fix the issue.  But then how to inject the events is the
problem.

Thanks,
Namhyung