linux-kernel - Re: [RFC/PATCH] perf report: Support latency profiling in system-wide mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACT4Y+aiU-dHVgTKEpyJtn=RUUyYJp8U5BjyWSOHm6b2ODp9cA@mail.gmail.com>
Date: Tue, 6 May 2025 09:40:52 +0200
From: Dmitry Vyukov <dvyukov@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Arnaldo Carvalho de Melo <acme@...nel.org>, Ian Rogers <irogers@...gle.com>, 
	Kan Liang <kan.liang@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>, 
	Adrian Hunter <adrian.hunter@...el.com>, Peter Zijlstra <peterz@...radead.org>, 
	Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>, 
	linux-perf-users@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>
Subject: Re: [RFC/PATCH] perf report: Support latency profiling in system-wide mode

On Tue, 6 May 2025 at 09:10, Namhyung Kim <namhyung@...nel.org> wrote:
> > > > Where does the patch check that this mode is used only for system-wide profiles?
> > > > Is it that PERF_SAMPLE_CPU present only for system-wide profiles?
> > >
> > > Basically yes, but you can use --sample-cpu to add it.
> >
> > Are you sure? --sample-cpu seems to work for non-system-wide profiles too.
>
> Yep, that's why I said "Basically".  So it's not 100% guarantee.
>
> We may disable latency column by default in this case and show warning
> if it's requested.  Or we may add a new attribute to emit sched-switch
> records only for idle tasks and enable the latency report only if the
> data has sched-switch records.
>
> What do you think?

Depends on what problem we are trying to solve:

1. Enabling latency profiling for system-wide mode.

2. Switch events bloating trace too much.

3. Lost switch events lead to imprecise accounting.

The patch mentions all 3 :)
But I think 2 and 3 are not really specific to system-wide mode.
An active single process profile can emit more samples than a
system-wide profile on a lightly loaded system.
Similarly, if we rely on switch events for system-wide mode, then it's
equally subject to the lost events problem.

For problem 1: we can just permit --latency for system wide mode and
fully rely on switch events.
It's not any worse than we do now (wrt both profile size and lost events).

For problem 2: yes, we could emit only switches to idle tasks. Or
maybe just a fake CPU sample for an idle task? That's effectively what
we want, then your current accounting code will work w/o any changes.
This should help wrt trace size only for system-wide mode (provided
that user already enables CPU accounting for other reasons, otherwise
it's unclear what's better -- attaching CPU to each sample, or writing
switch events).

For problem 3: switches to idle task won't really help. There can be
lots of them, and missing any will lead to wrong accounting.
A principled approach would be to attach a per-thread scheduler
quantum sequence number to each CPU sample. The sequence number would
be incremented on every context switch. Then any subset of CPU should
be enough to understand when a task was scheduled in and out
(scheduled in on the first CPU sample with sequence number N, and
switched out on the last sample with sequence number N).