[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+aiU-dHVgTKEpyJtn=RUUyYJp8U5BjyWSOHm6b2ODp9cA@mail.gmail.com>
Date: Tue, 6 May 2025 09:40:52 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>, Ian Rogers <irogers@...gle.com>,
Kan Liang <kan.liang@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>, Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
linux-perf-users@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>
Subject: Re: [RFC/PATCH] perf report: Support latency profiling in system-wide mode
On Tue, 6 May 2025 at 09:10, Namhyung Kim <namhyung@...nel.org> wrote:
> > > > Where does the patch check that this mode is used only for system-wide profiles?
> > > > Is it that PERF_SAMPLE_CPU present only for system-wide profiles?
> > >
> > > Basically yes, but you can use --sample-cpu to add it.
> >
> > Are you sure? --sample-cpu seems to work for non-system-wide profiles too.
>
> Yep, that's why I said "Basically". So it's not 100% guarantee.
>
> We may disable latency column by default in this case and show warning
> if it's requested. Or we may add a new attribute to emit sched-switch
> records only for idle tasks and enable the latency report only if the
> data has sched-switch records.
>
> What do you think?
Depends on what problem we are trying to solve:
1. Enabling latency profiling for system-wide mode.
2. Switch events bloating trace too much.
3. Lost switch events lead to imprecise accounting.
The patch mentions all 3 :)
But I think 2 and 3 are not really specific to system-wide mode.
An active single process profile can emit more samples than a
system-wide profile on a lightly loaded system.
Similarly, if we rely on switch events for system-wide mode, then it's
equally subject to the lost events problem.
For problem 1: we can just permit --latency for system wide mode and
fully rely on switch events.
It's not any worse than we do now (wrt both profile size and lost events).
For problem 2: yes, we could emit only switches to idle tasks. Or
maybe just a fake CPU sample for an idle task? That's effectively what
we want, then your current accounting code will work w/o any changes.
This should help wrt trace size only for system-wide mode (provided
that user already enables CPU accounting for other reasons, otherwise
it's unclear what's better -- attaching CPU to each sample, or writing
switch events).
For problem 3: switches to idle task won't really help. There can be
lots of them, and missing any will lead to wrong accounting.
A principled approach would be to attach a per-thread scheduler
quantum sequence number to each CPU sample. The sequence number would
be incremented on every context switch. Then any subset of CPU should
be enough to understand when a task was scheduled in and out
(scheduled in on the first CPU sample with sequence number N, and
switched out on the last sample with sequence number N).
Powered by blists - more mailing lists