linux-kernel - Re: [PATCH] perf/test: Fix test case Leader sampling on s390.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fXJd+gXWrgRC1vBzueCgsLjGesV+oenq3a9irq0+gLNDw@mail.gmail.com>
Date: Sat, 7 Feb 2026 14:06:51 -0800
From: Ian Rogers <irogers@...gle.com>
To: Arnaldo Carvalho de Melo <acme@...nel.org>
Cc: Thomas Richter <tmricht@...ux.ibm.com>, linux-kernel@...r.kernel.org, 
	linux-s390@...r.kernel.org, linux-perf-users@...r.kernel.org, 
	namhyung@...nel.org, agordeev@...ux.ibm.com, gor@...ux.ibm.com, 
	sumanthk@...ux.ibm.com, hca@...ux.ibm.com, japo@...ux.ibm.com, 
	James Clark <james.clark@...aro.org>
Subject: Re: [PATCH] perf/test: Fix test case Leader sampling on s390.

On Fri, Feb 6, 2026 at 1:41 PM Arnaldo Carvalho de Melo <acme@...nel.org> wrote:
>
> On Fri, Nov 28, 2025 at 10:11:39AM +0100, Thomas Richter wrote:
> > The subtest 'Leader sampling' some time fails on s390.
> > - for z/VM guest: Disable the test for z/VM guest. There is no
> >   CPU Measurement facility to run the test successfully.
> > - for LPAR: Use correct event names.
>
> This one fell thru the cracks, still applies cleanly and the extra logic
> affects only s390, applying to perf-tools-next,
>
> - Arnaldo
>
> > A detailed analysis follows here:
> > Now to the debugging and investigation:
> > 1. With command
> >        perf record -e '{cycles,cycles}:S' -- ....
> >    the first cycles event starts sampling.
> >    On s390 this sets up sampling with a frequency of 4000 Hz.
> >    This translates to hardware sample rate of 1377000 instructions per
> >    micro-second to meet a frequency of 4000 HZ.
> >
> > 2. With first event cycles now sampling into a hardware buffer, an
> >    interrupt is triggered each time a sampling buffer gets full.
> >    The interrupt handler is then invoked and debug output shows the
> >    processing of samples.  The size of one hardware sample is 32 bytes.
> >    With an interrupt triggered when the hardware buffer page of 4KB
> >    gets full, the interrupt handler processes 128 samples.
> >    (This is taken from s390 specific fast debug data gathering)
> >    2025-11-07 14:35:51.977248  000003ffe013cbfa \
> >          perf_event_count_update event->count 0x0 count 0x1502e8
> >    2025-11-07 14:35:51.977248  000003ffe013cbfa \
> >          perf_event_count_update event->count 0x1502e8 count 0x1502e8
> >    2025-11-07 14:35:51.977248  000003ffe013cbfa \
> >          perf_event_count_update event->count 0x2a05d0 count 0x1502e8
> >    2025-11-07 14:35:51.977252  000003ffe013cbfa \
> >          perf_event_count_update event->count 0x3f08b8 count 0x1502e8
> >    2025-11-07 14:35:51.977252  000003ffe013cbfa \
> >          perf_event_count_update event->count 0x540ba0 count 0x1502e8
> >    2025-11-07 14:35:51.977253  000003ffe013cbfa \
> >          perf_event_count_update event->count 0x690e88 count 0x1502e8
> >    2025-11-07 14:35:51.977254  000003ffe013cbfa \
> >          perf_event_count_update event->count 0x7e1170 count 0x1502e8
> >    2025-11-07 14:35:51.977254  000003ffe013cbfa \
> >          perf_event_count_update event->count 0x931458 count 0x1502e8
> >    2025-11-07 14:35:51.977254  000003ffe013cbfa \
> >          perf_event_count_update event->count 0xa81740 count 0x1502e8
> >
> > 3. The value is constantly increasing by the number of instructions
> >    executed to generate a sample entry.  This is the first line of the
> >    pairs of lines. count 0x1502e8 --> 1377000
> >
> >    # perf script | grep 1377000 | wc -l
> >    214
> >    # perf script | wc -l
> >    428
> >    #
> >    That is 428 lines in total, and half of the lines contain value
> >    1377000.
> >
> > 4. The second event cycles is opened against the counting PMU, which
> >    is an independent PMU and is not interrupt driven.  Once enabled it
> >    runs in the background and keeps running, incrementing silently
> >    about 400+ counters. The counter values are read via assembly
> >    instructions.
> >
> >    This second counter PMU's read call back function is called when the
> >    interrupt handler of the sampling facility processes each sample. The
> >    function call sequence is:
> >
> >    perf_event_overflow()
> >    +--> __perf_event_overflow()
> >         +--> __perf_event_output()
> >                +--> perf_output_sample()
> >                     +--> perf_output_read()
> >                          +--> perf_output_read_group()
> >                                 for_each_sibling_event(sub, leader) {
> >               values[n++] = perf_event_count(sub, self);
> >               printk("%s sub %p values %#lx\n", __func__, sub, values[n-1]);
> >                                 }
> >
> >    The last function perf_event_count() is invoked on the second event
> >    cylces *on* the counting PMU. An added printk statement shows the
> >    following lines in the dmesg output:
> >
> >    # dmesg|grep perf_output_read_group |head -10
> >    [  332.368620] perf_output_read_group sub 00000000d80b7c1f values 0x3a80917 (1)
> >    [  332.368624] perf_output_read_group sub 00000000d80b7c1f values 0x3a86c7f (2)
> >    [  332.368627] perf_output_read_group sub 00000000d80b7c1f values 0x3a89c15 (3)
> >    [  332.368629] perf_output_read_group sub 00000000d80b7c1f values 0x3a8c895 (4)
> >    [  332.368631] perf_output_read_group sub 00000000d80b7c1f values 0x3a8f569 (5)
> >    [  332.368633] perf_output_read_group sub 00000000d80b7c1f values 0x3a9204b
> >    [  332.368635] perf_output_read_group sub 00000000d80b7c1f values 0x3a94790
> >    [  332.368637] perf_output_read_group sub 00000000d80b7c1f values 0x3a9704b
> >    [  332.368638] perf_output_read_group sub 00000000d80b7c1f values 0x3a99888
> >    #
> >
> >    This correlates with the output of
> >    # perf report -D | grep 'id 00000000000000'|head -10
> >    ..... id 0000000000000006, value 00000000001502e8, lost 0
> >    ..... id 000000000000000e, value 0000000003a80917, lost 0 --> line (1) above
> >    ..... id 0000000000000006, value 00000000002a05d0, lost 0
> >    ..... id 000000000000000e, value 0000000003a86c7f, lost 0 --> line (2) above
> >    ..... id 0000000000000006, value 00000000003f08b8, lost 0
> >    ..... id 000000000000000e, value 0000000003a89c15, lost 0 --> line (3) above
> >    ..... id 0000000000000006, value 0000000000540ba0, lost 0
> >    ..... id 000000000000000e, value 0000000003a8c895, lost 0 --> line (4) above
> >    ..... id 0000000000000006, value 0000000000690e88, lost 0
> >    ..... id 000000000000000e, value 0000000003a8f569, lost 0 --> line (5) above
> >
> > Summary:
> > - Above command starts the CPU sampling facility, with runs interrupt
> >   driven when a 4KB page is full. An interrupt processes the 128 samples
> >   and calls eventually perf_output_read_group() for each sample to save it
> >   in the event's ring buffer.
> >
> > - At that time the CPU counting facility is invoked to read the value of
> >   the event cycles. This value is saved as the second value in the
> >   sample_read structure.
> >
> > - The first and odd lines in the perf script output displays the period
> >   value between 2 samples being created by hardware. It is the number
> >   of instructions executes before the hardware writes a sample.
> >
> > - The second and even lines in the perf script output displays the number
> >   of CPU cycles needed to process each sample and save it in the event's
> >   ring buffer.
> > These 2 different values can never be identical on s390.
> >
> > Since event leader sampling is not possible on s390 the perf tool will
> > return EOPNOTSUPP soon. Perpare the test case for that.
> >
> > Suggested-by: James Clark <james.clark@...aro.org>
> > Signed-off-by: Thomas Richter <tmricht@...ux.ibm.com>
> > Tested-by: Jan Polensky <japo@...ux.ibm.com>
> > Reviewed-by: Jan Polensky <japo@...ux.ibm.com>
> > ---
> >  tools/perf/tests/shell/record.sh | 16 +++++++++++++++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh
> > index 0f5841c479e7..46b96d565680 100755
> > --- a/tools/perf/tests/shell/record.sh
> > +++ b/tools/perf/tests/shell/record.sh
> > @@ -260,7 +260,21 @@ test_uid() {
> >
> >  test_leader_sampling() {
> >    echo "Basic leader sampling test"
> > -  if ! perf record -o "${perfdata}" -e "{cycles,cycles}:Su" -- \
> > +  events="{cycles,cycles}:Su"
> > +  [ $(uname -m) = "s390x" ] && {

This broke shell check for me:
```
In tests/shell/record.sh line 264:
 [ $(uname -m) = "s390x" ] && {
   ^---------^ SC2046 (warning): Quote this to prevent word splitting.

For more information:
 https://www.shellcheck.net/wiki/SC2046 -- Quote this to prevent word splitt...
```

I'll mail the fix.

Thanks,
Ian

> > +    [ ! -d /sys/devices/cpum_sf ] && {
> > +      echo "No CPUMF [Skipped record]"
> > +      return
> > +    }
> > +    events="{cpum_sf/SF_CYCLES_BASIC/,cycles}:Su"
> > +    perf record -o "${perfdata}" -e "$events" -- perf test -w brstack 2> /dev/null
> > +    # Perf grouping might be unsupported, depends on version.
> > +    [ "$?" -ne 0 ] && {
> > +      echo "Grouping not support [Skipped record]"
> > +      return
> > +    }
> > +  }
> > +  if ! perf record -o "${perfdata}" -e "$events" -- \
> >      perf test -w brstack 2> /dev/null
> >    then
> >      echo "Leader sampling [Failed record]"
> > --
> > 2.52.0
>