[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z_TJ6NHZ1v4ucd8p@google.com>
Date: Tue, 8 Apr 2025 00:02:00 -0700
From: Namhyung Kim <namhyung@...nel.org>
To: Howard Chu <howardchu95@...il.com>
Cc: acme@...nel.org, mingo@...hat.com, mark.rutland@....com,
alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
irogers@...gle.com, adrian.hunter@...el.com, peterz@...radead.org,
kan.liang@...ux.intel.com, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] perf trace: Fix inconsistent failures in perf trace's
tests
On Fri, Apr 04, 2025 at 07:12:26PM -0700, Howard Chu wrote:
> Hello,
>
> On Fri, Apr 4, 2025 at 11:02 AM Namhyung Kim <namhyung@...nel.org> wrote:
> >
> > On Thu, Apr 03, 2025 at 09:16:52PM -0700, Howard Chu wrote:
> > > There are two failures that frequently occur in perf trace's tests. One
> > > is the failure of 'perf trace BTF general tests'; The other one is the
> > > failure of 'perf trace record and replay', which, when run independently,
> > > always succeeds.
> > >
> > > The root cause of the first failure, is that perf trace may give two types
> > > of output, depending on whether the comm of a process can be parsed, for
> > > example:
> > >
> > > mv/312705 renameat2(CWD, "/tmp/file1_VJOT", CWD, "/tmp/file2_VJOT", NOREPLACE) = 0
> > > :312774/312774 renameat2(CWD, "/tmp/file1_5YcE", CWD, "/tmp/file2_5YcE", NOREPLACE) = 0
> > >
> > > In the test, however, grep is always looking for the comm 'mv', which
> > > sometimes may not be present.
> > >
> > > The cause of the second failure is that 'perf trace BTF general tests'
> > > modifies the perf config, and because tests are run concurrently,
> > > subsequent tests use the modified perf config before the BTF general
> > > test can restore the original config. Mark the BTF general tests as
> > > exclusive will solve the failure.
>
> Yeah, I was wrong — I now suspect it has something to do with two
> augmented_syscall BPF programs running at the same time. I noticed the
> offcpu test has '(exclusive)' too. Do you think it's a BPF triggering
> issue? Like, if test A is trying to capture the clock_nanosleep
> syscall and test B is also trying to capture it, could it be that A
> ends up capturing both calls while B gets nothing? Just asking before
> I dig in further. :)
I don't think that will happen. I suspect it may have a timing issue
like two threads are racing to setup BPF for syscall tracing and
setting up the second thread's BPF will disable the first one for a
short amount of time. And 'mv' runs very shortly so it finished while
it's disabled. But it's just a wild guess.
Thanks,
Namhyung
>
> >
> > I'm not sure if the config is the cause of the failure. Also I don't
> > see it restored.
> >
> > IIUC the export only affects child processes from the current shell.
> > So other tests running in parallel won't see the config change.
> >
> > But still, there should be something to affect the behavior. It's
> > strange to miss the task name in COMM record.
>
> I can look into that too.
>
> >
> > I also confirm that running the test serially fixes it.
> >
> > Thanks,
> > Namhyung
>
> Thanks,
> Howard
Powered by blists - more mailing lists