[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fXJd+gXWrgRC1vBzueCgsLjGesV+oenq3a9irq0+gLNDw@mail.gmail.com>
Date: Sat, 7 Feb 2026 14:06:51 -0800
From: Ian Rogers <irogers@...gle.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Thomas Richter <tmricht@...ux.ibm.com>, linux-kernel@...r.kernel.org,
linux-s390@...r.kernel.org, linux-perf-users@...r.kernel.org,
namhyung@...nel.org, agordeev@...ux.ibm.com, gor@...ux.ibm.com,
sumanthk@...ux.ibm.com, hca@...ux.ibm.com, japo@...ux.ibm.com,
James Clark <james.clark@...aro.org>
Subject: Re: [PATCH] perf/test: Fix test case Leader sampling on s390.
On Fri, Feb 6, 2026 at 1:41 PM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
>
> On Fri, Nov 28, 2025 at 10:11:39AM +0100, Thomas Richter wrote:
> > The subtest 'Leader sampling' some time fails on s390.
> > - for z/VM guest: Disable the test for z/VM guest. There is no
> > CPU Measurement facility to run the test successfully.
> > - for LPAR: Use correct event names.
>
> This one fell thru the cracks, still applies cleanly and the extra logic
> affects only s390, applying to perf-tools-next,
>
> - Arnaldo
>
> > A detailed analysis follows here:
> > Now to the debugging and investigation:
> > 1. With command
> > perf record -e '{cycles,cycles}:S' -- ....
> > the first cycles event starts sampling.
> > On s390 this sets up sampling with a frequency of 4000 Hz.
> > This translates to hardware sample rate of 1377000 instructions per
> > micro-second to meet a frequency of 4000 HZ.
> >
> > 2. With first event cycles now sampling into a hardware buffer, an
> > interrupt is triggered each time a sampling buffer gets full.
> > The interrupt handler is then invoked and debug output shows the
> > processing of samples. The size of one hardware sample is 32 bytes.
> > With an interrupt triggered when the hardware buffer page of 4KB
> > gets full, the interrupt handler processes 128 samples.
> > (This is taken from s390 specific fast debug data gathering)
> > 2025-11-07 14:35:51.977248 000003ffe013cbfa \
> > perf_event_count_update event->count 0x0 count 0x1502e8
> > 2025-11-07 14:35:51.977248 000003ffe013cbfa \
> > perf_event_count_update event->count 0x1502e8 count 0x1502e8
> > 2025-11-07 14:35:51.977248 000003ffe013cbfa \
> > perf_event_count_update event->count 0x2a05d0 count 0x1502e8
> > 2025-11-07 14:35:51.977252 000003ffe013cbfa \
> > perf_event_count_update event->count 0x3f08b8 count 0x1502e8
> > 2025-11-07 14:35:51.977252 000003ffe013cbfa \
> > perf_event_count_update event->count 0x540ba0 count 0x1502e8
> > 2025-11-07 14:35:51.977253 000003ffe013cbfa \
> > perf_event_count_update event->count 0x690e88 count 0x1502e8
> > 2025-11-07 14:35:51.977254 000003ffe013cbfa \
> > perf_event_count_update event->count 0x7e1170 count 0x1502e8
> > 2025-11-07 14:35:51.977254 000003ffe013cbfa \
> > perf_event_count_update event->count 0x931458 count 0x1502e8
> > 2025-11-07 14:35:51.977254 000003ffe013cbfa \
> > perf_event_count_update event->count 0xa81740 count 0x1502e8
> >
> > 3. The value is constantly increasing by the number of instructions
> > executed to generate a sample entry. This is the first line of the
> > pairs of lines. count 0x1502e8 --> 1377000
> >
> > # perf script | grep 1377000 | wc -l
> > 214
> > # perf script | wc -l
> > 428
> > #
> > That is 428 lines in total, and half of the lines contain value
> > 1377000.
> >
> > 4. The second event cycles is opened against the counting PMU, which
> > is an independent PMU and is not interrupt driven. Once enabled it
> > runs in the background and keeps running, incrementing silently
> > about 400+ counters. The counter values are read via assembly
> > instructions.
> >
> > This second counter PMU's read call back function is called when the
> > interrupt handler of the sampling facility processes each sample. The
> > function call sequence is:
> >
> > perf_event_overflow()
> > +--> __perf_event_overflow()
> > +--> __perf_event_output()
> > +--> perf_output_sample()
> > +--> perf_output_read()
> > +--> perf_output_read_group()
> > for_each_sibling_event(sub, leader) {
> > values[n++] = perf_event_count(sub, self);
> > printk("%s sub %p values %#lx\n", __func__, sub, values[n-1]);
> > }
> >
> > The last function perf_event_count() is invoked on the second event
> > cylces *on* the counting PMU. An added printk statement shows the
> > following lines in the dmesg output:
> >
> > # dmesg|grep perf_output_read_group |head -10
> > [ 332.368620] perf_output_read_group sub 00000000d80b7c1f values 0x3a80917 (1)
> > [ 332.368624] perf_output_read_group sub 00000000d80b7c1f values 0x3a86c7f (2)
> > [ 332.368627] perf_output_read_group sub 00000000d80b7c1f values 0x3a89c15 (3)
> > [ 332.368629] perf_output_read_group sub 00000000d80b7c1f values 0x3a8c895 (4)
> > [ 332.368631] perf_output_read_group sub 00000000d80b7c1f values 0x3a8f569 (5)
> > [ 332.368633] perf_output_read_group sub 00000000d80b7c1f values 0x3a9204b
> > [ 332.368635] perf_output_read_group sub 00000000d80b7c1f values 0x3a94790
> > [ 332.368637] perf_output_read_group sub 00000000d80b7c1f values 0x3a9704b
> > [ 332.368638] perf_output_read_group sub 00000000d80b7c1f values 0x3a99888
> > #
> >
> > This correlates with the output of
> > # perf report -D | grep 'id 00000000000000'|head -10
> > ..... id 0000000000000006, value 00000000001502e8, lost 0
> > ..... id 000000000000000e, value 0000000003a80917, lost 0 --> line (1) above
> > ..... id 0000000000000006, value 00000000002a05d0, lost 0
> > ..... id 000000000000000e, value 0000000003a86c7f, lost 0 --> line (2) above
> > ..... id 0000000000000006, value 00000000003f08b8, lost 0
> > ..... id 000000000000000e, value 0000000003a89c15, lost 0 --> line (3) above
> > ..... id 0000000000000006, value 0000000000540ba0, lost 0
> > ..... id 000000000000000e, value 0000000003a8c895, lost 0 --> line (4) above
> > ..... id 0000000000000006, value 0000000000690e88, lost 0
> > ..... id 000000000000000e, value 0000000003a8f569, lost 0 --> line (5) above
> >
> > Summary:
> > - Above command starts the CPU sampling facility, with runs interrupt
> > driven when a 4KB page is full. An interrupt processes the 128 samples
> > and calls eventually perf_output_read_group() for each sample to save it
> > in the event's ring buffer.
> >
> > - At that time the CPU counting facility is invoked to read the value of
> > the event cycles. This value is saved as the second value in the
> > sample_read structure.
> >
> > - The first and odd lines in the perf script output displays the period
> > value between 2 samples being created by hardware. It is the number
> > of instructions executes before the hardware writes a sample.
> >
> > - The second and even lines in the perf script output displays the number
> > of CPU cycles needed to process each sample and save it in the event's
> > ring buffer.
> > These 2 different values can never be identical on s390.
> >
> > Since event leader sampling is not possible on s390 the perf tool will
> > return EOPNOTSUPP soon. Perpare the test case for that.
> >
> > Suggested-by: James Clark <james.clark@...aro.org>
> > Signed-off-by: Thomas Richter <tmricht@...ux.ibm.com>
> > Tested-by: Jan Polensky <japo@...ux.ibm.com>
> > Reviewed-by: Jan Polensky <japo@...ux.ibm.com>
> > ---
> > tools/perf/tests/shell/record.sh | 16 +++++++++++++++-
> > 1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh
> > index 0f5841c479e7..46b96d565680 100755
> > --- a/tools/perf/tests/shell/record.sh
> > +++ b/tools/perf/tests/shell/record.sh
> > @@ -260,7 +260,21 @@ test_uid() {
> >
> > test_leader_sampling() {
> > echo "Basic leader sampling test"
> > - if ! perf record -o "${perfdata}" -e "{cycles,cycles}:Su" -- \
> > + events="{cycles,cycles}:Su"
> > + [ $(uname -m) = "s390x" ] && {
This broke shell check for me:
```
In tests/shell/record.sh line 264:
[ $(uname -m) = "s390x" ] && {
^---------^ SC2046 (warning): Quote this to prevent word splitting.
For more information:
https://www.shellcheck.net/wiki/SC2046 -- Quote this to prevent word splitt...
```
I'll mail the fix.
Thanks,
Ian
> > + [ ! -d /sys/devices/cpum_sf ] && {
> > + echo "No CPUMF [Skipped record]"
> > + return
> > + }
> > + events="{cpum_sf/SF_CYCLES_BASIC/,cycles}:Su"
> > + perf record -o "${perfdata}" -e "$events" -- perf test -w brstack 2> /dev/null
> > + # Perf grouping might be unsupported, depends on version.
> > + [ "$?" -ne 0 ] && {
> > + echo "Grouping not support [Skipped record]"
> > + return
> > + }
> > + }
> > + if ! perf record -o "${perfdata}" -e "$events" -- \
> > perf test -w brstack 2> /dev/null
> > then
> > echo "Leader sampling [Failed record]"
> > --
> > 2.52.0
>
Powered by blists - more mailing lists