[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aDormwKnOYm_-Jgs@google.com>
Date: Fri, 30 May 2025 15:05:15 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>,
Ian Rogers <irogers@...gle.com>,
Kan Liang <kan.liang@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
linux-perf-users@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>
Subject: Re: [RFC/PATCH] perf report: Support latency profiling in
system-wide mode
On Fri, May 30, 2025 at 07:50:45AM +0200, Dmitry Vyukov wrote:
> On Wed, 28 May 2025 at 20:38, Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > Hello,
> >
> > On Tue, May 27, 2025 at 09:14:34AM +0200, Dmitry Vyukov wrote:
> > > On Wed, 21 May 2025 at 09:30, Dmitry Vyukov <dvyukov@...gle.com> wrote:
> > > >
> > > > > Maybe we can use this
> > > > > only for the frequency mode which means user didn't use -c option or
> > > > > similar in the event description.
> > >
> > >
> > > All-in-all I think the best option for now is using CPU IDs to track
> > > parallelism as you suggested, but be more precise with idle detection.
> > > 2 passes over the trace may be fine to detect idle points. I see the
> > > most time now spent in hist_entry__cmp, which accesses other entries
> > > and is like a part of O(N*logN) processing, so a simple O(N) pass
> > > shouldn't slow it down much.
> > > That's what I would try. But I would also try to assess the precision
> > > of this approach by comparing with results of using explicit switch
> > > events.
> >
> > It's not clear to me how you want to maintain the idle info in the 2
> > pass approach. Please feel free to propose something based on this
> > work.
>
>
> What part of it is unclear?
>
> Basically, in the first pass we only mark events as sched_out/in.
> When we don't see samples on a CPU for 2*period, we mark the previous
> sample on the CPU as sched_out:
>
> // Assuming the period is stable across time and CPUs.
> for_each_cpu(cpu) {
> if (current[cpu]->last_timestamp + 2*period < sample->timestamp) {
> if (current[cpu]->thread != idle)
> current[cpu]->last_sample->sched_out = true;
> }
> }
>
> leader = machine__findnew_thread(machine, sample->pid);
> if (current[sample->cpu]->thread != leader) {
> current[sample->cpu]->last_sample->sched_out = true;
> sample->sched_in = true;
> }
> current[sample->cpu]->thread = leader;
> current[sample->cpu]->last_sample = sample;
> current[sample->cpu]->last_timestamp = sample->timestamp;
Oh, you wanted to save the info in the sample. But I'm afraid it won't
work since it's stack allocated for one-time use in the
perf_session__deliver_event().
>
>
> On the second pass we use the precomputed sched_in/out to calculate parallelism:
>
> leader = machine__findnew_thread(machine, sample->pid);
> if (sample->sched_in)
> leader->parallelism++;
> sample->parallelism = leader->parallelism;
> if (sample->sched_out)
> leader->parallelism--;
>
> This is more precise b/c we don't consider a thread running for
> 2*period after it stopped running.
IIUC it can make some samples have less parallelism right before
they go to idle.
> A more precise approach would probably be to consider the thread
> running for 0.5*period after the last sample (and similarly for
> 0.5*period before the first sample), but it would require injecting
> sched_in/out events into the trace at these points.
Yep, that will fix the issue. But then how to inject the events is the
problem.
Thanks,
Namhyung
Powered by blists - more mailing lists