[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fUvsF7RtLAKaMwc28CeSEOJ+j0gVwvQN59moOnUS=kWVg@mail.gmail.com>
Date: Wed, 19 Nov 2025 08:25:58 -0800
From: Ian Rogers <irogers@...gle.com>
To: Andi Kleen <ak@...ux.intel.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Adrian Hunter <adrian.hunter@...el.com>, "Dr. David Alan Gilbert" <linux@...blig.org>,
Yang Li <yang.lee@...ux.alibaba.com>, James Clark <james.clark@...aro.org>,
Thomas Falcon <thomas.falcon@...el.com>, Thomas Richter <tmricht@...ux.ibm.com>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
Dapeng Mi <dapeng1.mi@...ux.intel.com>
Subject: Re: [PATCH v5 3/3] perf stat: Add no-affinity flag
On Wed, Nov 19, 2025 at 7:37 AM Andi Kleen <ak@...ux.intel.com> wrote:
>
> > Ack. This is only adding the flag to perf stat, are the storms as much
> > of an issue there? Patch 2 of 3 changes it so that for a single event
> > we still use affinities, where a dummy and an event count as >1 event.
>
> Not sure I follow here. I thought you disabled it completely?
No, when we have `perf stat` we may have a single event
open/enable/disable/read/close that needs running on a particular CPU.
If we have a group of events, for a metric, the group may turn into a
single syscall that reads the group. What I've done is made it so that
in the case of a single syscall, rather than try to change the
affinity through all the CPUs we just take the hit of the IPI. If
there are 2 syscalls needed (for open/enable/...) then we use the
affinity mechanism. The key function is evlist__use_affinity and that
tries to make this >1 IPI calculation, where >1 IPI means use
affinities. I suspect the true threshold number for when we should use
IPIs probably isn't >1, but I'm hoping that this saving is obviously
true and we can change the number later. Maybe we can do some io_uring
thing in the longer term to batch up all these changes and let the
kernel worry about optimizing the changes.
The previous affinity code wasn't used for events in per-thread mode,
but when trying to use that more widely I found bugs in its iteration.
So I did a bigger re-engineering that is in patch 2 now. The code
tries to spot the grouping case, and to ignore certain kinds of events
like retirement latency and tool events that don't benefit from the
affinity mechanism regardless of what their CPU mask is saying.
> > We have specific examples of loaded machines where the scheduling
> > latency causes broken metrics - the flag at least allows investigation
> > of issues like this. I don't mind reviewing a patch adding real time
> > priorities as an option.
>
> You don't need a new flag. Just run perf with real time priority with
> any standard wrapper tool, like chrt. The main obstacle is that you may
> need the capability to do that though.
Ack. Thanks!
Ian
Powered by blists - more mailing lists