[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEmfU+vfYY8jVD-mLuRBVrk9M8OFCNiptXQjL1Rf+S5XtjiDVg@mail.gmail.com>
Date: Fri, 8 Nov 2024 08:59:11 -0800
From: Benjamin Peterson <benjamin@...flow.com>
To: Howard Chu <howardchu95@...il.com>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Arnaldo Carvalho de Melo <acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>, Jiri Olsa <jolsa@...nel.org>,
Ian Rogers <irogers@...gle.com>, Adrian Hunter <adrian.hunter@...el.com>,
"Liang, Kan" <kan.liang@...ux.intel.com>, open list <linux-kernel@...r.kernel.org>,
"open list:PERFORMANCE EVENTS SUBSYSTEM" <linux-perf-users@...r.kernel.org>
Subject: Re: [PATCH 2/2] perf tests: add test for trace output loss
On Fri, Nov 8, 2024 at 8:20 AM Howard Chu <howardchu95@...il.com> wrote:
>
> Hello Benjamin,
>
> On Thu, Nov 7, 2024 at 3:18 PM Benjamin Peterson <benjamin@...flow.com> wrote:
> >
> > On Thu, Nov 7, 2024 at 2:07 PM Howard Chu <howardchu95@...il.com> wrote:
> > >
> > > Hello,
> > >
> > > On Wed, Nov 6, 2024 at 3:46 PM Benjamin Peterson <benjamin@...flow.com> wrote:
> > > >
> > > > Add a test that checks that trace output is not lost to races. This is
> > > > accomplished by tracing the exit_group syscall of "true" multiple times and
> > > > checking for correct output.
> > > >
> > > > Conveniently, this test also serves as a regression test for 5fb8e56542a3 ("perf
> > > > trace: avoid garbage when not printing a trace event's arguments") because
> > > > exit_group triggers the previously buggy printing behavior.
> > > >
> > > > Signed-off-by: Benjamin Peterson <benjamin@...flow.com>
> > > > ---
> > > > tools/perf/tests/shell/trace_exit_race.sh | 31 +++++++++++++++++++++++
> > > > 1 file changed, 31 insertions(+)
> > > > create mode 100755 tools/perf/tests/shell/trace_exit_race.sh
> > > >
> > > > diff --git a/tools/perf/tests/shell/trace_exit_race.sh b/tools/perf/tests/shell/trace_exit_race.sh
> > > > new file mode 100755
> > > > index 000000000000..8b70324bc5b4
> > > > --- /dev/null
> > > > +++ b/tools/perf/tests/shell/trace_exit_race.sh
> > > > @@ -0,0 +1,31 @@
> > > > +#!/bin/sh
> > > > +# perf trace exit race
> > > > +# SPDX-License-Identifier: GPL-2.0
> > > > +
> > > > +# Check that the last events of a perf trace'd subprocess are not
> > > > +# lost. Specifically, trace the exiting syscall of "true" 100 times and ensure
> > > > +# the output contains 100 correct lines.
> > > > +
> > > > +# shellcheck source=lib/probe.sh
> > > > +. "$(dirname $0)"/lib/probe.sh
> > > > +
> > > > +skip_if_no_perf_trace || exit 2
> > > > +
> > > > +trace_shutdown_race() {
> > > > + for i in $(seq 100); do
> > > > + perf trace -e syscalls:sys_enter_exit_group true 2>>$file
> > > > + done
> > > > + [ $(grep -c -E " +[0-9]+\.[0-9]+ +true/[0-9]+ syscalls:sys_enter_exit_group\(\)$" $file) = "100" ]
> > >
> > > The test failed due to regex mismatched, I think because of this:
> > >
> > > 0.000 true/1526046 syscalls:sys_enter_exit_group( )
> > > 0.000 true/1526212 syscalls:sys_enter_exit_group( )
> > > 0.000 true/1526383 syscalls:sys_enter_exit_group(SSSSSSSSSSSSSSSS)
> > >
> > > Ironically the junk buffer problem you fixed last time, thanks for
> > > finding another printing problem in perf trace :), I'll figure out the
> > > cause of this.
> >
> > Thanks, the problem is a parallel issue with trace__fprintf_sys_enter.
> > I will include a patch for the problem in my spin of this series.
> >
> > >
> > > Another thing is this test takes a long time to finish
> > >
> > > perf $ time ./perf test 109
> > > 109: perf trace exit race : FAILED!
> > >
> > > real 0m38.762s
> > > user 0m15.090s
> > > sys 0m21.794s
> > >
> > > Is it really necessary to run it 100 times? To me it seems to be just
> > > a wrong handling logic of draining samples, will there be coincidence?
> >
> > Yes, it's racy. In my testing, sometimes the correct output would
> > appear. However, I will reduce the number of iterations to 10.
>
> Could you be a bit more specific about this race condition issue?
>From the code, it's apparent that if the SIGCHLD from the traced
process is seen before the exit_group trace event, output will be
lost. If the signal is received after the trace event, the trace event
will be correctly printed.
> Could you please tell us how one might emulate and test the scenario
> you mentioned? I ran the command 1,000 times before and after applying
> the patch, and each time I got the expected output. Could you please
> provide more information about your use cases?
I can see it simply by running the unpatched perf few times in the
terminal, but it might be helpful for reproduction to create some
scheduling contention with stress.
Powered by blists - more mailing lists