[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fUF_GQKiBoKVx-z5SUZsmHdqhxzdD39iH9O=53xg2EuHQ@mail.gmail.com>
Date: Mon, 17 Nov 2025 20:32:20 -0800
From: Ian Rogers <irogers@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>,
"Dr. David Alan Gilbert" <linux@...blig.org>, Yang Li <yang.lee@...ux.alibaba.com>,
James Clark <james.clark@...aro.org>, Thomas Falcon <thomas.falcon@...el.com>,
Thomas Richter <tmricht@...ux.ibm.com>, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
Dapeng Mi <dapeng1.mi@...ux.intel.com>
Subject: Re: [PATCH v4 10/10] perf stat: Add no-affinity flag
On Mon, Nov 17, 2025 at 6:41 PM Namhyung Kim <namhyung@...nel.org> wrote:
>
> On Thu, Nov 13, 2025 at 10:05:16AM -0800, Ian Rogers wrote:
> > Add flag that disables affinity behavior. Using sched_setaffinity to
> > place a perf thread on a CPU can avoid certain interprocessor
> > interrupts but may introduce a delay due to the scheduling,
> > particularly on loaded machines. Add a command line option to disable
> > the behavior. This behavior is less present in other tools like `perf
> > record`, as it uses a ring buffer and doesn't make repeated system
> > calls.
> >
> > Signed-off-by: Ian Rogers <irogers@...gle.com>
> > ---
> > tools/perf/Documentation/perf-stat.txt | 4 ++++
> > tools/perf/builtin-stat.c | 6 ++++++
> > tools/perf/util/evlist.c | 2 +-
> > tools/perf/util/evlist.h | 1 +
> > 4 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt
> > index 1a766d4a2233..1ffb510606af 100644
> > --- a/tools/perf/Documentation/perf-stat.txt
> > +++ b/tools/perf/Documentation/perf-stat.txt
> > @@ -382,6 +382,10 @@ color the metric's computed value.
> > Don't print output, warnings or messages. This is useful with perf stat
> > record below to only write data to the perf.data file.
> >
> > +--no-affinity::
> > +Don't change scheduler affinities when iterating over CPUs. Disables
> > +an optimization aimed at minimizing interprocessor interrupts.
> > +
> > STAT RECORD
> > -----------
> > Stores stat data into perf data file.
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index aec93b91fd11..fa42b08bd1df 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -2415,6 +2415,7 @@ static int parse_tpebs_mode(const struct option *opt, const char *str,
> > int cmd_stat(int argc, const char **argv)
> > {
> > struct opt_aggr_mode opt_mode = {};
> > + bool no_affinity = false;
> > struct option stat_options[] = {
> > OPT_BOOLEAN('T', "transaction", &transaction_run,
> > "hardware transaction statistics"),
> > @@ -2543,6 +2544,8 @@ int cmd_stat(int argc, const char **argv)
> > "don't print 'summary' for CSV summary output"),
> > OPT_BOOLEAN(0, "quiet", &quiet,
> > "don't print any output, messages or warnings (useful with record)"),
> > + OPT_BOOLEAN(0, "no-affinity", &no_affinity,
> > + "don't allow affinity optimizations aimed at reducing IPIs"),
>
> I know you want to add an option to disable the behaivor, but I think
> it'd better to have a positive option like just '--affinity'. Then we
> will have '--no-affinity' for free. :) The current form will allow
> '--no-no-affinity'.
>
> Then the variable also can be 'enable_affinity' or so.
>
> You can mention --no-affinity in the help message and the man page
> document so that users can discover the intention.
I was trying to keep the code the same as the flag. I don't want to
have an affinity flag you need to set to true in say evlist, as that
will need updating in all users of the evlist or become a behavioral
change. We can have the evlist with no_affinity and invert the flag,
it just looks awkward imo.
Thanks,
Ian
> Thanks,
> Namhyung
>
>
> > OPT_CALLBACK(0, "cputype", &evsel_list, "hybrid cpu type",
> > "Only enable events on applying cpu with this type "
> > "for hybrid platform (e.g. core or atom)",
> > @@ -2600,6 +2603,9 @@ int cmd_stat(int argc, const char **argv)
> > } else
> > stat_config.csv_sep = DEFAULT_SEPARATOR;
> >
> > + if (no_affinity)
> > + evsel_list->no_affinity = true;
> > +
> > if (argc && strlen(argv[0]) > 2 && strstarts("record", argv[0])) {
> > argc = __cmd_record(stat_options, &opt_mode, argc, argv);
> > if (argc < 0)
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > index fc3dae7cdfca..53c8e974de8b 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -368,7 +368,7 @@ static bool evlist__use_affinity(struct evlist *evlist)
> > struct perf_cpu_map *used_cpus = NULL;
> > bool ret = false;
> >
> > - if (!evlist->core.user_requested_cpus ||
> > + if (evlist->no_affinity || !evlist->core.user_requested_cpus ||
> > cpu_map__is_dummy(evlist->core.user_requested_cpus))
> > return false;
> >
> > diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> > index b4604c3f03d6..c7ba0e0b2219 100644
> > --- a/tools/perf/util/evlist.h
> > +++ b/tools/perf/util/evlist.h
> > @@ -59,6 +59,7 @@ struct event_enable_timer;
> > struct evlist {
> > struct perf_evlist core;
> > bool enabled;
> > + bool no_affinity;
> > int id_pos;
> > int is_pos;
> > int nr_br_cntr;
> > --
> > 2.51.2.1041.gc1ab5b90ca-goog
> >
Powered by blists - more mailing lists