[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aDCsmjb7Fex1ccOW@x1>
Date: Fri, 23 May 2025 14:12:58 -0300
From: Arnaldo Carvalho de Melo <acme@...nel.org>
To: Leo Yan <leo.yan@....com>
Cc: Namhyung Kim <namhyung@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
"Liang, Kan" <kan.liang@...ux.intel.com>,
James Clark <james.clark@...aro.org>,
linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] perf tests switch-tracking: Fix timestamp comparison
On Fri, May 23, 2025 at 09:10:36AM +0100, Leo Yan wrote:
> On Thu, May 22, 2025 at 10:57:41PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Thu, May 22, 2025 at 10:55:46PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Mon, Mar 31, 2025 at 06:27:59PM +0100, Leo Yan wrote:
> > > > The test might fail on the Arm64 platform with the error:
> > > > perf test -vvv "Track with sched_switch"
> > > > Missing sched_switch events
> > > > The issue is caused by incorrect handling of timestamp comparisons. The
> > > > comparison result, a signed 64-bit value, was being directly cast to an
> > > > int, leading to incorrect sorting for sched events.
> > > > Fix this by explicitly returning 0, 1, or -1 based on whether the result
> > > > is zero, positive, or negative.
> > > > Fixes: d44bc5582972 ("perf tests: Add a test for tracking with sched_switch")
> > > > Signed-off-by: Leo Yan <leo.yan@....com>
> > > How can I reproduce this?
> > > Testing on a rpi5, 64-bit debian, this test passes:
>
> Sorry that I did not give precise info for reproducing the failure.
> The case does not fail everytime, usually I can trigger the failure
> after run 20 ~ 30 times:
>
> # while true; do perf test "Track with sched_switch"; done
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : FAILED!
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : FAILED!
> 106: Track with sched_switch : Ok
> 106: Track with sched_switch : Ok
> I used cross compiler to build Perf tool on my host machine and tested on
> Debian / Juno board. Generally, I think this issue is not very specific
> to GCC versions. As both internal CI and my local env can reproduce the
> issue.
> Please let me know if need any more info. Thanks!
> ---8<---
> My Host Build compiler:
> # aarch64-linux-gnu-gcc --version
> aarch64-linux-gnu-gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
> Juno Board:
> # lsb_release -a
> No LSB modules are available.
> Distributor ID: Debian
> Description: Debian GNU/Linux 12 (bookworm)
> Release: 12
> Codename: bookworm
Thanks for the extra info, I'll add it to the commit log message, and
perhaps we could make this test exclusive and use stress-ng to generate
some background noise in the form of a good number of processes, see:
root@x1:~# stress-ng --switch $(($(nproc) * 2)) --timeout 30s & for a in $(seq 50) ; do perf test switch ; done
[1] 1773322
stress-ng: info: [1773322] setting to a 30 secs run per stressor
77: Track with sched_switch : Running (1 active)
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : Running (1 active)
stress-ng: info: [1773322] skipped: 0
stress-ng: info: [1773322] passed: 24: switch (24)
stress-ng: info: [1773322] failed: 0
stress-ng: info: [1773322] metrics untrustworthy: 0
77: Track with sched_switch : FAILED!
[1]+ Done stress-ng --switch $(($(nproc) * 2)) --timeout 30s
77: Track with sched_switch : Ok
77: Track with sched_switch : Ok
77: Track with sched_switch : Ok
77: Track with sched_switch : Ok
77: Track with sched_switch : FAILED!
77: Track with sched_switch : Ok
77: Track with sched_switch : Ok
77: Track with sched_switch : FAILED!
77: Track with sched_switch : FAILED!
77: Track with sched_switch : Ok
root@x1:~#
Now with your patch it also fails, so its for another reason:
--- start ---
test child forked, pid 1777071
Using CPUID GenuineIntel-6-BA-3
mmap size 528384B
45221 events recorded
Missing comm events
---- end(-1) ----
113: Track with sched_switch : FAILED!
Lots of short lived processes makes it fail as well :-\
Oh well...
I was just trying to improve this test case so that we would show it
failing before your patch and passing after it, but I ran out of time
:-\
Your patch is correct, so I'll probably just add your comments and go
with it.
- Arnaldo
Powered by blists - more mailing lists