lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 26 Nov 2023 15:41:14 +0800
From:   Leo Yan <leo.yan@...aro.org>
To:     Nick Forrington <nick.forrington@....com>
Cc:     Michael Petlan <mpetlan@...hat.com>, linux-kernel@...r.kernel.org,
        linux-perf-users@...r.kernel.org,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        vmolnaro@...hat.com
Subject: Re: [PATCH] perf test: Remove atomics from test_loop to avoid test
 failures

On Sat, Nov 25, 2023 at 07:10:25PM +0000, Nick Forrington wrote:

[...]

> > @Nick, could you narrow down which specific test case causing the
> > failure.
> > 
> > [...]
> 
> All checks for ${testsym} in record.sh (including the example you provide)
> can fail, as the expected symbol (test_loop) is not the top-most function on
> the stack (and therefore not the symbol associated with the sample).
> 
> 
> Example perf report output:
> 
> # Overhead  Command  Shared Object          Symbol
> # ........  .......  ..................... .............................
> #
>     99.53%  perf     perf                   [.] __aarch64_ldadd4_relax

Thanks for confirmation.

I cannot reproduce the issue on my Juno board with Debian (buster).
It's likely because I don't use GCC toolchain with enabling
'-moutline-atomics' AArch64 flag [1].

> ...
> 
> 
> You can see the issue when recording/reporting with call stacks:
> 
> # Children      Self  Command  Shared Object          Symbol
> # ........  ........  .......  .....................
> ..........................................................
> #
>     99.52%    99.52%  perf     perf                   [.]
> __aarch64_ldadd4_relax
>             |
>             |--49.77%--0xffffb905a5dc
>             |          0xffffb8ff0aec
>             |          thfunc
>             |          test_loop
>             |          __aarch64_ldadd4_relax



> ...
> 
> > 
> > > I believe that it was there to prevent the compiler to optimize the loop
> > > out or some reason like that. Hopefully, it will work even without that
> > > on all architectures with all compilers that are used for building perf...
> > Agreed.
> > 
> > As said above, I'd like to step back a bit for making clear what's the
> > exactly failure caused by the program.
> 
> I don't think this loop could be sensibly optimised away, as it depends on
> "done", which is defined at file scope (and assigned by a signal handler).

I verified your patch with 'perf annotate'.

The disassembly on Arm64 is:

    noinline void test_loop(void)               
    {                                           
      stp  x29, x30, [sp, #-32]!                
      mov  x29, sp                              
      adrp x0, options+0x4c0                    
      ldr  x0, [x0, #3664]                      
      ldr  x1, [x0]                             
      str  x1, [sp, #24]                        
      mov  x1, #0x0                        // #0
    while (!done);                              
      nop                                       
20:   adrp x0, spacing.22251                    
      add  x0, x0, #0xa04                       
      ldr  w0, [x0]                             
      cmp  w0, #0x0                             
    ↑ b.eq 20                                   
    } 

The disassembly on x86_64 is:

    noinline void test_loop(void)
    {
      endbr64
      push    %rbp
      mov     %rsp,%rbp
      sub     $0x10,%rsp
      mov     %fs:0x28,%rax
      mov     %rax,-0x8(%rbp)
      xor     %eax,%eax
    while (!done);
      nop
1c:   mov     done,%eax
      test    %eax,%eax
    ↑ je      1c
    }


Maybe the commit log caused a bit confusion, the problem is after
enabling "-moutline-atomics" on aarch64, the overhead is altered into
the linked __aarch64_ldadd4_relax() function, test_loop() cannot be
sampled anymore, but it's not about stack tracing.

Anyway, the patch is fine for me.

Thanks,
Leo

[1] https://stackoverflow.com/questions/65239845/how-to-enable-mno-outline-atomics-aarch64-flag

Powered by blists - more mailing lists