[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231126074030.GA647134@leoy-yangtze.lan>
Date: Sun, 26 Nov 2023 15:41:14 +0800
From: Leo Yan <leo.yan@...aro.org>
To: Nick Forrington <nick.forrington@....com>
Cc: Michael Petlan <mpetlan@...hat.com>, linux-kernel@...r.kernel.org,
linux-perf-users@...r.kernel.org,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
vmolnaro@...hat.com
Subject: Re: [PATCH] perf test: Remove atomics from test_loop to avoid test
failures
On Sat, Nov 25, 2023 at 07:10:25PM +0000, Nick Forrington wrote:
[...]
> > @Nick, could you narrow down which specific test case causing the
> > failure.
> >
> > [...]
>
> All checks for ${testsym} in record.sh (including the example you provide)
> can fail, as the expected symbol (test_loop) is not the top-most function on
> the stack (and therefore not the symbol associated with the sample).
>
>
> Example perf report output:
>
> # Overhead Command Shared Object Symbol
> # ........ ....... ..................... .............................
> #
> 99.53% perf perf [.] __aarch64_ldadd4_relax
Thanks for confirmation.
I cannot reproduce the issue on my Juno board with Debian (buster).
It's likely because I don't use GCC toolchain with enabling
'-moutline-atomics' AArch64 flag [1].
> ...
>
>
> You can see the issue when recording/reporting with call stacks:
>
> # Children Self Command Shared Object Symbol
> # ........ ........ ....... .....................
> ..........................................................
> #
> 99.52% 99.52% perf perf [.]
> __aarch64_ldadd4_relax
> |
> |--49.77%--0xffffb905a5dc
> | 0xffffb8ff0aec
> | thfunc
> | test_loop
> | __aarch64_ldadd4_relax
> ...
>
> >
> > > I believe that it was there to prevent the compiler to optimize the loop
> > > out or some reason like that. Hopefully, it will work even without that
> > > on all architectures with all compilers that are used for building perf...
> > Agreed.
> >
> > As said above, I'd like to step back a bit for making clear what's the
> > exactly failure caused by the program.
>
> I don't think this loop could be sensibly optimised away, as it depends on
> "done", which is defined at file scope (and assigned by a signal handler).
I verified your patch with 'perf annotate'.
The disassembly on Arm64 is:
noinline void test_loop(void)
{
stp x29, x30, [sp, #-32]!
mov x29, sp
adrp x0, options+0x4c0
ldr x0, [x0, #3664]
ldr x1, [x0]
str x1, [sp, #24]
mov x1, #0x0 // #0
while (!done);
nop
20: adrp x0, spacing.22251
add x0, x0, #0xa04
ldr w0, [x0]
cmp w0, #0x0
↑ b.eq 20
}
The disassembly on x86_64 is:
noinline void test_loop(void)
{
endbr64
push %rbp
mov %rsp,%rbp
sub $0x10,%rsp
mov %fs:0x28,%rax
mov %rax,-0x8(%rbp)
xor %eax,%eax
while (!done);
nop
1c: mov done,%eax
test %eax,%eax
↑ je 1c
}
Maybe the commit log caused a bit confusion, the problem is after
enabling "-moutline-atomics" on aarch64, the overhead is altered into
the linked __aarch64_ldadd4_relax() function, test_loop() cannot be
sampled anymore, but it's not about stack tracing.
Anyway, the patch is fine for me.
Thanks,
Leo
[1] https://stackoverflow.com/questions/65239845/how-to-enable-mno-outline-atomics-aarch64-flag
Powered by blists - more mailing lists