[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAH0uvog_RgX7wmOZ6=-rBY3YtSqJ5mRmUMOE+hehEmJ6cMEPsA@mail.gmail.com>
Date: Thu, 6 Feb 2025 14:08:59 -0800
From: Howard Chu <howardchu95@...il.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: namhyung@...nel.org, mark.rutland@....com,
alexander.shishkin@...ux.intel.com, jolsa@...nel.org, irogers@...gle.com,
adrian.hunter@...el.com, kan.liang@...ux.intel.com,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>, James Clark <james.clark@...aro.org>,
Peter Zijlstra <peterz@...radead.org>, Arnaldo Carvalho de Melo <acme@...hat.com>
Subject: Re: [PATCH v14 05/10] perf evsel: Assemble offcpu samples
Hello Arnaldo,
On Thu, Feb 6, 2025 at 11:27 AM Arnaldo Carvalho de Melo
<acme@...nel.org> wrote:
>
> On Sun, Dec 15, 2024 at 10:12:15AM -0800, Howard Chu wrote:
> > Use the data in bpf-output samples, to assemble offcpu samples. In
> > evsel__is_offcpu_event(), Check if sample_type is PERF_SAMPLE_RAW to
> > support off-cpu sample data created by an older version of perf.
>
> > Testing compatibility on offcpu samples collected by perf before this patch series:
>
> > See below, the sample_type still uses PERF_SAMPLE_CALLCHAIN
>
> Getting back from vacation, so bear with me, please :-)
Good to have you back :), no worries at all.
>
> So, while at this point:
>
> ⬢ [acme@...lbox perf-tools-next]$ git log --oneline -5
> bfd7461c25a5e667 (HEAD) perf record --off-cpu: Dump off-cpu samples in BPF
> e75f8ce63bfa6cb9 perf record --off-cpu: Preparation of off-cpu BPF program
> 0ffab9d26971c91c perf record --off-cpu: Parse off-cpu event
> efc3fe2070853b7d perf evsel: Expose evsel__is_offcpu_event() for future use
> 357b965deba9fb71 (perf-tools-next/perf-tools-next) perf stat: Changes to event name uniquification
> ⬢ [acme@...lbox perf-tools-next]$
>
> And trying to test this series I do:
>
> root@...ber:~# perf -v
> perf version 6.13.rc2.gbfd7461c25a5
> root@...ber:~#
>
> root@...ber:~# perf record --off-cpu sleep 5s
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.037 MB perf.data (7 samples) ]
> root@...ber:~# perf evlist
> cpu_atom/cycles/P
> cpu_core/cycles/P
> offcpu-time
> dummy:u
> root@...ber:~#
>
> And:
>
> root@...ber:~# perf evlist -v
> cpu_atom/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0xa00000000, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
> cpu_core/cycles/P: type: 0 (PERF_TYPE_HARDWARE), size: 136, config: 0x400000000, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, inherit: 1, freq: 1, enable_on_exec: 1, precise_ip: 3, sample_id_all: 1
> offcpu-time: type: 1 (software), size: 136, config: 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|CPU|PERIOD|IDENTIFIER, read_format: ID|LOST, disabled: 1, enable_on_exec: 1, sample_id_all: 1
> dummy:u: type: 1 (software), size: 136, config: 0x9 (PERF_COUNT_SW_DUMMY), { sample_period, sample_freq }: 1, sample_type: IP|TID|TIME|IDENTIFIER, read_format: ID|LOST, inherit: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, task: 1, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
> root@...ber:~#
>
> Above the "offcpu-time" event doesn't have the PERF_SAMPLE_CALLCHAIN in
> sample_type, so I should be using some other 'perf record --off-cpu'
> specific option?
>
> The HEAD mentions:
>
> "Collect tid, period, callchain, and cgroup id and dump them when off-cpu
> time threshold is reached."
These are collected in BPF, not from perf_event. I can change the
commit message for clarity if you’d like.
>
> But that doesn't match the "offcpu-time" sample_type, right?
Nope, I'm sorry. If offcpu-time has PERF_SAMPLE_CALLCHAIN, it will
collect the wrong call chain.
Say I have task1 scheduling in and task2 scheduling out. For off-CPU
analysis, I want to save task2's call chain and check if task1 has
been scheduled out for longer than a threshold. If it has, a sample
for task1 should be emitted.
When a sched_switch happens, the call chain belongs to task2, and perf
should only emit task1's call chain if there is one. Therefore,
PERF_SAMPLE_CALLCHAIN, which collects an instant stack call chain at
sched_switch, cannot be used.
>
> > $ perf script --header -i ./perf.data.ptn | grep "event : name = offcpu-time"
> > # event : name = offcpu-time, , id = { 237917, 237918, 237919, 237920 }, type = 1 (software), size = 136, config = 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CALLCHAIN|CPU|PERIOD|IDENTIFIER, read_format = ID|LOST, disabled = 1, freq = 1, sample_id_all = 1
> >
> > The output is correct.
>
> You mean because it has PERF_SAMPLE_CALLCHAIN in the sample_type?
The motivation for this patch is to add backward compatibility, i.e.,
to ensure that perf script processes the previously collected off-cpu
perf.data correctly before the change to offcpu-time's sample_type.
>
> > $ perf script -i ./perf.data.ptn | grep offcpu-time
> > gmain 2173 [000] 18446744069.414584: 100102015 offcpu-time:
> > NetworkManager 901 [000] 18446744069.414584: 5603579 offcpu-time:
> > Web Content 1183550 [000] 18446744069.414584: 46278 offcpu-time:
> > gnome-control-c 2200559 [000] 18446744069.414584: 11998247014 offcpu-time:
>
> > And after this patch series:
>
> After the _whole_ series? Or at this point in the series?
I should mention that the two perf.data files I used in the commit
message, perf.data.ptn and perf.data.off-cpu-v9, were collected from
perf without this series and perf with the whole series, respectively
(since only then can I use the --off-cpu-thresh option).
Again, I can add an explanation to the commit message if you’d like.
Sorry for the confusion.
>
> I'm trying to test it patch by patch, to see from output from 'perf
> report -D', perf script, etc that the patch descriptions match what the
> code does and then double checking by trying to reproduce it, step by
> step.
Thank you very much, talking about meticulousness :)
>
> I'll try to do it at the end of the series, but thought about trowing in
> these questions to help me understand your work.
Sure! Throw me some more if you want, because it has been some time.
Thanks,
Howard
>
> Thanks!
>
> - Arnaldo
>
> > $ perf script --header -i ./perf.data.off-cpu-v9 | grep "event : name = offcpu-time"
> > # event : name = offcpu-time, , id = { 237959, 237960, 237961, 237962 }, type = 1 (software), size = 136, config = 0xa (PERF_COUNT_SW_BPF_OUTPUT), { sample_period, sample_freq } = 1, sample_type = IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER, read_format = ID|LOST, disabled = 1, freq = 1, sample_id_all = 1
> >
> > perf $ ./perf script -i ./perf.data.off-cpu-v9 | grep offcpu-time
> > gnome-shell 1875 [001] 4789616.361225: 100097057 offcpu-time:
> > gnome-shell 1875 [001] 4789616.461419: 100107463 offcpu-time:
> > firefox 2206821 [002] 4789616.475690: 255257245 offcpu-time:
> >
> > Suggested-by: Namhyung Kim <namhyung@...nel.org>
> > Reviewed-by: Ian Rogers <irogers@...gle.com>
> > Signed-off-by: Howard Chu <howardchu95@...il.com>
> > Cc: Adrian Hunter <adrian.hunter@...el.com>
> > Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
> > Cc: Ingo Molnar <mingo@...hat.com>
> > Cc: James Clark <james.clark@...aro.org>
> > Cc: Jiri Olsa <jolsa@...nel.org>
> > Cc: Kan Liang <kan.liang@...ux.intel.com>
> > Cc: Mark Rutland <mark.rutland@....com>
> > Cc: Peter Zijlstra <peterz@...radead.org>
> > Link: https://lore.kernel.org/r/20241108204137.2444151-7-howardchu95@gmail.com
> > Signed-off-by: Arnaldo Carvalho de Melo <acme@...hat.com>
> > ---
> > tools/perf/util/evsel.c | 35 ++++++++++++++++++++++++++++++++++-
> > 1 file changed, 34 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> > index fc33104cd659..277b5babd63e 100644
> > --- a/tools/perf/util/evsel.c
> > +++ b/tools/perf/util/evsel.c
> > @@ -1244,7 +1244,8 @@ static void evsel__set_default_freq_period(struct record_opts *opts,
> >
> > bool evsel__is_offcpu_event(struct evsel *evsel)
> > {
> > - return evsel__is_bpf_output(evsel) && evsel__name_is(evsel, OFFCPU_EVENT);
> > + return evsel__is_bpf_output(evsel) && evsel__name_is(evsel, OFFCPU_EVENT) &&
> > + evsel->core.attr.sample_type & PERF_SAMPLE_RAW;
> > }
> >
> > /*
> > @@ -2846,6 +2847,35 @@ static inline bool evsel__has_branch_counters(const struct evsel *evsel)
> > return false;
> > }
> >
> > +static int __set_offcpu_sample(struct perf_sample *data)
> > +{
> > + u64 *array = data->raw_data;
> > + u32 max_size = data->raw_size, *p32;
> > + const void *endp = (void *)array + max_size;
> > +
> > + if (array == NULL)
> > + return -EFAULT;
> > +
> > + OVERFLOW_CHECK_u64(array);
> > + p32 = (void *)array++;
> > + data->pid = p32[0];
> > + data->tid = p32[1];
> > +
> > + OVERFLOW_CHECK_u64(array);
> > + data->period = *array++;
> > +
> > + OVERFLOW_CHECK_u64(array);
> > + data->callchain = (struct ip_callchain *)array++;
> > + OVERFLOW_CHECK(array, data->callchain->nr * sizeof(u64), max_size);
> > + data->ip = data->callchain->ips[1];
> > + array += data->callchain->nr;
> > +
> > + OVERFLOW_CHECK_u64(array);
> > + data->cgroup = *array;
> > +
> > + return 0;
> > +}
> > +
> > int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
> > struct perf_sample *data)
> > {
> > @@ -3197,6 +3227,9 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
> > array = (void *)array + sz;
> > }
> >
> > + if (evsel__is_offcpu_event(evsel))
> > + return __set_offcpu_sample(data);
> > +
> > return 0;
> > }
> >
> > --
> > 2.43.0
Powered by blists - more mailing lists