[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z5sccIPaYBeB_yab@google.com>
Date: Wed, 29 Jan 2025 22:30:08 -0800
From: Namhyung Kim <namhyung@...nel.org>
To: Dmitry Vyukov <dvyukov@...gle.com>
Cc: irogers@...gle.com, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org,
Arnaldo Carvalho de Melo <acme@...nel.org>
Subject: Re: [PATCH v3 6/7] perf report: Add --latency flag
On Wed, Jan 29, 2025 at 08:12:51AM +0100, Dmitry Vyukov wrote:
> On Wed, 29 Jan 2025 at 06:03, Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > On Mon, Jan 27, 2025 at 10:58:53AM +0100, Dmitry Vyukov wrote:
> > > Add record/report --latency flag that allows to capture and show
> > > latency-centric profiles rather than the default CPU-consumption-centric
> > > profiles. For latency profiles record captures context switch events,
> > > and report shows Latency as the first column.
> >
> > It'd be nice if you could add a small example output in the commit
> > message.
> >
> > >
> > > Signed-off-by: Dmitry Vyukov <dvyukov@...gle.com>
> > > Cc: Namhyung Kim <namhyung@...nel.org>
> > > Cc: Arnaldo Carvalho de Melo <acme@...nel.org>
> > > Cc: Ian Rogers <irogers@...gle.com>
> > > Cc: linux-perf-users@...r.kernel.org
> > > Cc: linux-kernel@...r.kernel.org
> > > ---
> > [SNIP]
> > > diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
> > > index bc4c3acfe7552..2b6023de7a53a 100644
> > > --- a/tools/perf/util/sort.c
> > > +++ b/tools/perf/util/sort.c
> > > @@ -2622,6 +2622,7 @@ struct hpp_dimension {
> > > const char *name;
> > > struct perf_hpp_fmt *fmt;
> > > int taken;
> > > + int was_taken;
> > > };
> > >
> > > #define DIM(d, n) { .name = n, .fmt = &perf_hpp__format[d], }
> > > @@ -3513,6 +3514,7 @@ static int __hpp_dimension__add(struct hpp_dimension *hd,
> > > return -1;
> > >
> > > hd->taken = 1;
> > > + hd->was_taken = 1;
> > > perf_hpp_list__register_sort_field(list, fmt);
> > > return 0;
> > > }
> > > @@ -3547,10 +3549,15 @@ static int __hpp_dimension__add_output(struct perf_hpp_list *list,
> > > return 0;
> > > }
> > >
> > > -int hpp_dimension__add_output(unsigned col)
> > > +int hpp_dimension__add_output(unsigned col, bool implicit)
> > > {
> > > + struct hpp_dimension *hd;
> > > +
> > > BUG_ON(col >= PERF_HPP__MAX_INDEX);
> > > - return __hpp_dimension__add_output(&perf_hpp_list, &hpp_sort_dimensions[col]);
> > > + hd = &hpp_sort_dimensions[col];
> > > + if (implicit && !hd->was_taken)
> > > + return 0;
> >
> > I don't understand why you need 'was_taken' field. Can it use the
> > 'taken' field?
>
> I've started getting failed asserts in
> perf_hpp__cancel_cumulate/latency when removing columns still linked
> into sort order list.
Ok.
>
> I've figured out that columns we implicitly add in setup_overhead and
> perf_hpp__init must match exactly. If we add one in setup_overhead,
> but not in perf_hpp__init, then it starts failing.
Right, I don't remember why we don't have unregister_sort_field or let
the perf_hpp__column_unregister() remove it from the sort_list too.
>
> Taken does not work b/c there is reset_dimensions call between these
> functions, so they actually add the same columns twice to field/sort
> lists (and that prevents failures in
> perf_hpp__cancel_cumulate/latency).
I see.
>
> Initially I've just tried to match logic in setup_overhead and in
> perf_hpp__init. But it turned out quite messy, duplicate logic, and
> e.g. in setup_overhead we look at sort_order, but in perf_hpp__init we
> already can't look at it b/c we already altered it in setup_overhead.
> That logic would also be quite fragile. Adding was_taken looks like
> the simplest, most reliable, and less fragile with respect to future
> changes approach.
Maybe just remove it from the sort_list. There should be no reason to
keep it in the list.
>
>
> > > + return __hpp_dimension__add_output(&perf_hpp_list, hd);
> > > }
> > >
> > > int sort_dimension__add(struct perf_hpp_list *list, const char *tok,
> > > @@ -3809,10 +3816,24 @@ static char *setup_overhead(char *keys)
> > > if (sort__mode == SORT_MODE__DIFF)
> > > return keys;
> > >
> > > - keys = prefix_if_not_in("overhead", keys);
> > > -
> > > - if (symbol_conf.cumulate_callchain)
> > > - keys = prefix_if_not_in("overhead_children", keys);
> > > + if (symbol_conf.prefer_latency) {
> > > + keys = prefix_if_not_in("overhead", keys);
> > > + keys = prefix_if_not_in("latency", keys);
> > > + if (symbol_conf.cumulate_callchain) {
> > > + keys = prefix_if_not_in("overhead_children", keys);
> > > + keys = prefix_if_not_in("latency_children", keys);
> > > + }
> > > + } else if (!keys || (!strstr(keys, "overhead") &&
> > > + !strstr(keys, "latency"))) {
> > > + if (symbol_conf.enable_latency)
> > > + keys = prefix_if_not_in("latency", keys);
> > > + keys = prefix_if_not_in("overhead", keys);
> > > + if (symbol_conf.cumulate_callchain) {
> > > + if (symbol_conf.enable_latency)
> > > + keys = prefix_if_not_in("latency_children", keys);
> > > + keys = prefix_if_not_in("overhead_children", keys);
> > > + }
> > > + }
> >
> > Do you really need this complexity? I think it's better to fix the order
> > of columns in both case.
>
> This is sort order which affects ordering of row, and order of rows is
> basically the most important thing for a profiler. If we fix the
> order, it will be showing completely different things.
>
> "latency" and "overhead" are equivalent with respect to their
> fundamentality for a profile. So we shouldn't be adding any of them,
> if any of them are already explicitly specified by the user.
>
> For example, the command from tips.txt:
>
> perf report --hierarchy --sort latency,parallelism,comm,symbol
>
> if we prepend "overhead", it all falls apart.
>
> Or for 2 default modes (normals and latency) we want "overhead" or
> "latency" to come first. Prepending "latency" for the current CPU mode
> would lead to completely different ordering.
I see. You want to sort by overhead in the default mode even if it has
implicitly-added latency field, and to sort by latency if --latency is
given explicitly.
I think it can be done with the following command line.
$ perf report -F overhead,latency,parallelism,comm,sym -s overhead
or
$ perf report -F overhead,latency,parallelism,comm,sym -s latency
IOW, you can keep the same output column ordering and sort the result
differently. But I'm not sure if it'd make the code simpler. :)
Thanks,
Namhyung
Powered by blists - more mailing lists