[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAP-5=fUqs=mxdgQX0Vx=D0weQSitXh6a8DcW2FycDEk6J-=RtA@mail.gmail.com>
Date: Fri, 28 Feb 2025 16:36:02 -0800
From: Ian Rogers <irogers@...gle.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Thomas Richter <tmricht@...ux.ibm.com>, linux-kernel@...r.kernel.org,
linux-s390@...r.kernel.org, linux-perf-users@...r.kernel.org, acme@...nel.org,
agordeev@...ux.ibm.com, gor@...ux.ibm.com, sumanthk@...ux.ibm.com,
hca@...ux.ibm.com
Subject: Re: [PATCH] perf/test: Skip leader sampling for s390
On Fri, Feb 28, 2025 at 4:13 PM Namhyung Kim <namhyung@...nel.org> wrote:
>
> Hello,
>
> On Fri, Feb 28, 2025 at 07:22:41AM +0100, Thomas Richter wrote:
> > In tree linux-next
> > the perf test case 114 'perf record tests' has a subtest
> > named 'Basic leader sampling test' which always fails on s390.
> > Root cause is this invocation
> >
> > # perf record -vv -e '{cycles,cycles}:Su' -- perf test -w brstack
> >
> > ...
> > In the debug output the following 2 event are installed:
> >
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 0 (PERF_TYPE_HARDWARE)
> > size 136
> > config 0 (PERF_COUNT_HW_CPU_CYCLES)
> > { sample_period, sample_freq } 4000
> > sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
> > read_format ID|GROUP|LOST
> > disabled 1
> > exclude_kernel 1
> > exclude_hv 1
> > freq 1
> > sample_id_all 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 0 (PERF_TYPE_HARDWARE)
> > size 136
> > config 0 (PERF_COUNT_HW_CPU_CYCLES)
> > sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
> > read_format ID|GROUP|LOST
> > exclude_kernel 1
> > exclude_hv 1
> > sample_id_all 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1 cpu 0 group_fd 5 flags 0x8 = 6
> > ...
> >
> > The first event is the group leader and is installed as sampling event.
> > The secound one is group member and is installed as counting event.
> >
> > Namhyung Kim confirms this observation:
> > > Yep, the syntax '{event1,event2}:S' is for group leader sampling which
> > > reduces the overhead of PMU interrupts. The idea is that those events
> > > are scheduled together so sampling is enabled only for the leader
> > > (usually the first) event and it reads counts from the member events
> > > using PERF_SAMPLE_READ.
> > >
> > > So they should have the same counts if it uses the same events in a
> > > group.
> >
> > However this does not work on s390. s390 has one dedicated sampling PMU
> > which supports only one event. A different PMU is used for counting.
> > Both run concurrently using different setups and frequencies.
> >
> > On s390x a sampling event is setup using a preset trigger and a large
> > buffer. The hardware
> > - writes a samples (64 bytes) into this buffer
> > when a given number of CPU instructions has been executed.
> > - and triggers an interrupt when the buffer gets full.
> > The trigger has just a few possible values.
> >
> > On s390x the counting event cycles is used to read out the numer of
> > CPU cycles executed.
> >
> > On s390 above invocation created 2 events executed on 2 different
> > PMU and the result are diffent values from two independently running
> > PMUs which do not match in a consistent and reliably as on Intel:
> >
> > # ./perf record -e '{cycles,cycles}:Su' -- perf test -w brstack
Hi Thomas,
Thanks for reporting this! Could you try adding --count=100000 so that
we're not using frequency mode and we expect the counts to look like
100,000. For example, on my x86 laptop:
```
$ perf record --count=100000 -e '{cycles,cycles}:Su' -- perf test -w brstack
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.047 MB perf.data (712 samples) ]
$ perf script
perf 635952 290271.436115: 100007 cycles:
ffffffffada00080 [unknown] ([unknown])
perf 635952 290271.436115: 100007 cycles:
ffffffffada00080 [unknown] ([unknown])
perf 635952 290271.436650: 100525 cycles:
7f86352b01b3 _dl_map_object_from_fd+0x553
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.436650: 100525 cycles:
7f86352b01b3 _dl_map_object_from_fd+0x553
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437088: 99866 cycles:
7f86352cb827 strchr+0x27
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437088: 99866 cycles:
7f86352cb827 strchr+0x27
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437376: 99912 cycles:
7f86352cba74 strcmp+0x54
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437376: 99912 cycles:
7f86352cba74 strcmp+0x54
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437509: 100279 cycles:
7f86352cba3a strcmp+0x1a
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437509: 100279 cycles:
7f86352cba3a strcmp+0x1a
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437559: 99760 cycles:
7f86352bc39f _dl_check_map_versions+0x50f
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437559: 99760 cycles:
7f86352bc39f _dl_check_map_versions+0x50f
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
...
```
I'm particularly concerned if we see the cycles count very deviant
from the 100000.
> > ...
> > # ./perf script
> > perf 2799437 92568.845118: 5508000 cycles: 3ffbcb898b6 do_lookup_x+0x196
> > perf 2799437 92568.845119: 1377000 cycles: 3ffbcb898b6 do_lookup_x+0x196
> > perf 2799437 92568.845120: 4131000 cycles: 3ffbcb897e8 do_lookup_x+0xc8
> > perf 2799437 92568.845121: 1377000 cycles: 3ffbcb8a37c _dl_lookup_symbol
> > perf 2799437 92568.845122: 1377000 cycles: 3ffbcb89558 check_match+0x18
> > perf 2799437 92568.845123: 2754000 cycles: 3ffbcb89b2a do_lookup_x+0x40a
> > perf 2799437 92568.845124: 1377000 cycles: 3ffbcb89b1e do_lookup_x+0x3fe
> >
> > As can be seen the result match very often but not all the time
> > make this test on s390 failing very, very often.
Actually this is much more deviation than I'd expect. If we use
task-clock softer/timer based event I see:
```
$ perf record --count=100000 -e '{task-clock,task-clock}:Su' -- perf
test -w brstack
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.047 MB perf.data (712 samples) ]
$ perf script
perf 636643 290571.807049: 801858 task-clock:
7fdf48643439 _dl_map_object_from_fd+0x7d9
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 636643 290571.807049: 804012 task-clock:
7fdf48643439 _dl_map_object_from_fd+0x7d9
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 636643 290571.807549: 499833 task-clock:
7fdf4863eb9b _dl_map_object_deps+0x3eb
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 636643 290571.807549: 498236 task-clock:
7fdf4863eb9b _dl_map_object_deps+0x3eb
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
```
So the count deviates by a few hundred, but your output seems to
deviate by 4 million.
So, I think the test needs to be more tolerant that should help your
case. As Namhyung mentions I think there may be another bug lurking.
Thanks,
Ian
> > This patch bypasses this test on s390.
> >
> > Output before:
> > # ./perf test 114
> > 114: perf record tests : FAILED!
> > #
> >
> > Output after:
> > # ./perf test 114
> > 114: perf record tests : Ok
> > #
> >
> > Signed-off-by: Thomas Richter <tmricht@...ux.ibm.com>
> > Acked-by: Sumanth Korikkar <sumanthk@...ux.ibm.com>
>
> Thanks for the fix. I think Ian saw the same problem on other archs
> too. Maybe we need to enable it on supported archs only.
>
> Thanks,
> Namhyung
>
> > ---
> > tools/perf/tests/shell/record.sh | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh
> > index ba8d873d3ca7..98b69820bc5f 100755
> > --- a/tools/perf/tests/shell/record.sh
> > +++ b/tools/perf/tests/shell/record.sh
> > @@ -231,6 +231,12 @@ test_cgroup() {
> >
> > test_leader_sampling() {
> > echo "Basic leader sampling test"
> > + if [ "$(uname -m)" = s390x ]
> > + then
> > + echo "Leader sampling skipped"
> > + ((skipped+=1))
> > + return
> > + fi
> > if ! perf record -o "${perfdata}" -e "{cycles,cycles}:Su" -- \
> > perf test -w brstack 2> /dev/null
> > then
> > --
> > 2.45.2
> >
Powered by blists - more mailing lists