[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4f0bcc19-807c-40b0-a30c-309ba775693b@linaro.org>
Date: Thu, 19 Dec 2024 10:10:59 +0000
From: James Clark <james.clark@...aro.org>
To: Ian Rogers <irogers@...gle.com>
Cc: linux-arm-kernel@...ts.infradead.org, linux-perf-users@...r.kernel.org,
Will Deacon <will@...nel.org>, Mark Rutland <mark.rutland@....com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Adrian Hunter <adrian.hunter@...el.com>,
"Liang, Kan" <kan.liang@...ux.intel.com>,
John Garry <john.g.garry@...cle.com>, Mike Leach <mike.leach@...aro.org>,
Leo Yan <leo.yan@...ux.dev>, Graham Woodward <graham.woodward@....com>,
linux-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH 5/5] perf docs: arm_spe: Document new discard mode
On 18/12/2024 7:47 pm, Ian Rogers wrote:
> On Wed, Dec 18, 2024 at 2:07 AM James Clark <james.clark@...aro.org> wrote:
>>
>> On 18/12/2024 12:54 am, Ian Rogers wrote:
>>> On Tue, Dec 17, 2024 at 3:56 AM James Clark <james.clark@...aro.org> wrote:
>>>>
>>>> Document the flag, hint what it's used for and give an example with
>>>> other useful options to get minimal output.
>>>>
>>>> Signed-off-by: James Clark <james.clark@...aro.org>
>>>> ---
>>>> tools/perf/Documentation/perf-arm-spe.txt | 11 +++++++++++
>>>> 1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/tools/perf/Documentation/perf-arm-spe.txt b/tools/perf/Documentation/perf-arm-spe.txt
>>>> index de2b0b479249..588eead438bc 100644
>>>> --- a/tools/perf/Documentation/perf-arm-spe.txt
>>>> +++ b/tools/perf/Documentation/perf-arm-spe.txt
>>>> @@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/'
>>>> pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
>>>> store_filter=1 - collect stores only (PMSFCR.ST)
>>>> ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
>>>> + discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
>>>>
>>>> +++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
>>>> than only the execution latency.
>>>> @@ -220,6 +221,16 @@ Common errors
>>>>
>>>> Increase sampling interval (see above)
>>>>
>>>> +Discard mode
>>>> +~~~~~~~~~~~~
>>>> +
>>>> +SPE PMU events can be used without the overhead of collecting sample data if
>>>> +discard mode is supported (optional from Armv8.6). First run a system wide SPE
>>>> +session (or on the core of interest) using options to minimize output. Then run
>>>> +perf stat:
>>>> +
>>>> + perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
>>>> + perf stat -e SAMPLE_FEED_LD
>>>
>>> Perhaps clarify this should be an ARM SPE event? It seems strange to
>>> have one perf command affect a later one, the purpose of things like
>>> event multiplexing is to hide the hardware limits. I'd prefer if the
>>> last bit was like:
>>> ```
>>> Then run perf stat with an SPE event on the same PMU:
>>>
>>> perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
>>> perf stat -e arm_spe/SAMPLE_FEED_LD/
>>> ``
>>>
>>> Thanks,
>>> Ian
>>
>> Hi Ian,
>>
>> Confusingly this isn't an SPE event, it is a normal PMU event. The fact
>> that one Perf command affects the other is because these events only
>> count when SPE is enabled. When it's enabled it has an effect on a
>> per-core level which is why in the example I made it simpler by enabling
>> SPE system wide.
>>
>> SPE is an exclusive PMU like Coresight and some others so it can't be
>> affected by multiplexing or anything like that. The SAMPLE_FEED_LD PMU
>> would be, but as long as SPE stays enabled it will count the right thing
>> regardless of multiplexing.
>
> Thanks James, sorry for my SPE ignorance. I'm smiling about the use of
> the word exclusive. When I was trying to make the tests run in
> parallel I used a file lock - so shared and exclusive. There were a
> lot of issues with that, hence switching to 2 phases in the test,
> parallel then sequential but I kept the "exclusive" tag for want of a
> better word. Perhaps the notion of an exclusive PMU existed previously
Yeah, see PERF_PMU_CAP_EXCLUSIVE. Hopefully it doesn't cause too much
confusion, the context of test vs PMU should make it clear.
> but maybe I've accidentally invented the term by way of a failed file
> lock experiment :-)
>
> Presumably the two PMUs side-effecting each other is a known thing. I
> wonder if we can capture this in the documentation. When you say
> "normal PMU event" you mean core PMU events?
>
> Thanks,
> Ian
It should be a known thing yes, discard mode doesn't change this
behavior anyway but just makes one use case of it better. I can add
another section to this SPE manpage about it in a v2, that's probably
the best place for it.
And yes, I meant core PMU event. I can clarify that the second example
command is for a core PMU to avoid any doubt.
Thanks
James
Powered by blists - more mailing lists