lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6908562f-4a99-44ea-bffb-19f33fcffe83@linux.dev>
Date: Mon, 5 Jan 2026 17:06:49 -0800
From: Ihor Solodrai <ihor.solodrai@...ux.dev>
To: Nathan Chancellor <nathan@...nel.org>
Cc: Alexei Starovoitov <ast@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
 Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman
 <eddyz87@...il.com>, Yonghong Song <yonghong.song@...ux.dev>,
 bpf@...r.kernel.org, linux-kernel@...r.kernel.org, llvm@...ts.linux.dev
Subject: Re: [PATCH bpf-next] scripts/gen-btf.sh: Disable LTO when generating
 initial .o file

On 1/5/26 3:46 PM, Nathan Chancellor wrote:
> On Mon, Jan 05, 2026 at 02:01:36PM -0800, Ihor Solodrai wrote:
>> Hi Nathan, thank you for the patch.
>>
>> I'm starting to think it wasn't a good idea to do
>>
>> 	echo "" | ${CC} ...
>>
>> here, given the number of associated bugs.
> 
> Yeah, I was wondering if a lack of KBUILD_CPPFLAGS would also be a
> problem since that contains the endianness flag for some targets. I
> cannot imagine any more issues than that but I can understand wanting to
> back out of it.
> 
>> Before gen-btf.sh was introduced, the .btf.o binary was generated with this [1]:
>>
>> 	${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \
>> 		--strip-all ${1} "${btf_data}" 2>/dev/null
>>
>> I changed to ${CC} on the assumption it's a quicker operation than
>> stripping entire vmlinux. But maybe it's not worth it and we should
>> change back to --strip-all? wdyt?
> 
> That certainly seems more robust to me. I see the logic but with
> '--only-section' and no glob, I would expect that to be a rather quick
> operation but I am running out of time today to test and benchmark such
> a change. I will try to do it tomorrow unless someone beats me to it.

I got curious and did a little experiment. Basically, I ran perf stat
on this part of gen-btf.sh:

	echo "" | ${CC} ${CLANG_FLAGS} ${KBUILD_CFLAGS} -c -x c -o ${btf_data} -
	${OBJCOPY} --add-section .BTF=${ELF_FILE}.BTF \
		--set-section-flags .BTF=alloc,readonly ${btf_data}
	${OBJCOPY} --only-section=.BTF --strip-all ${btf_data}

Replacing ${CC} command with:

	${OBJCOPY} --strip-all "${ELF_FILE}" ${btf_data} 2>/dev/null

for comparison.

TL;DR is that using ${CC} is:
  * about 1.5x faster than GNU objcopy --strip-all .tmp_vmlinux1
  * about 16x (!) faster than llvm-objcopy --strip-all .tmp_vmlinux1

With obvious caveats that this is a particular machine (Threadripper
PRO 3975WX), toolchain etc:
  * clang version 21.1.7
  * gcc (GCC) 15.2.1 20251211

This is bpf-next (a069190b590e) with BPF CI-like kconfig.

Pasting perf stat output below.


# llvm-objcopy --strip-all
$ perf stat -r 31 -- ./gen-btf.o_strip.sh

 Performance counter stats for './gen-btf.o_strip.sh' (31 runs):

     1,300,945,256      task-clock:u                     #    0.962 CPUs utilized               ( +-  0.10% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
           327,311      page-faults:u                    #  251.595 K/sec                       ( +-  0.00% )
     1,532,927,570      instructions:u                   #    1.33  insn per cycle            
                                                  #    0.03  stalled cycles per insn     ( +-  0.00% )
     1,155,639,083      cycles:u                         #    0.888 GHz                         ( +-  0.18% )
        53,144,866      stalled-cycles-frontend:u        #    4.60% frontend cycles idle        ( +-  0.99% )
       297,229,466      branches:u                       #  228.472 M/sec                       ( +-  0.00% )
           903,337      branch-misses:u                  #    0.30% of all branches             ( +-  0.02% )

           1.35200 +- 0.00137 seconds time elapsed  ( +-  0.10% )


# GNU objcopy --strip-all
$ perf stat -r 31 -- ./gen-btf.o_strip.sh

 Performance counter stats for './gen-btf.o_strip.sh' (31 runs):

       119,747,488      task-clock:u                     #    0.970 CPUs utilized               ( +-  0.41% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             9,186      page-faults:u                    #   76.711 K/sec                       ( +-  0.01% )
       132,651,881      instructions:u                   #    1.68  insn per cycle            
                                                  #    0.08  stalled cycles per insn     ( +-  0.00% )
        79,191,259      cycles:u                         #    0.661 GHz                         ( +-  1.06% )
        10,136,981      stalled-cycles-frontend:u        #   12.80% frontend cycles idle        ( +-  2.58% )
        28,422,807      branches:u                       #  237.356 M/sec                       ( +-  0.00% )
           354,981      branch-misses:u                  #    1.25% of all branches             ( +-  0.02% )

          0.123415 +- 0.000564 seconds time elapsed  ( +-  0.46% )


# echo "" | clang ...
$ perf stat -r 31 -- ./gen-btf.o_llvm.sh

 Performance counter stats for './gen-btf.o_llvm.sh' (31 runs):

        62,107,490      task-clock:u                     #    0.774 CPUs utilized               ( +-  0.31% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             9,755      page-faults:u                    #  157.066 K/sec                       ( +-  0.01% )
        88,196,854      instructions:u                   #    1.18  insn per cycle            
                                                  #    0.19  stalled cycles per insn     ( +-  0.00% )
        74,944,793      cycles:u                         #    1.207 GHz                         ( +-  0.50% )
        16,494,448      stalled-cycles-frontend:u        #   22.01% frontend cycles idle        ( +-  0.48% )
        17,914,949      branches:u                       #  288.451 M/sec                       ( +-  0.00% )
           459,548      branch-misses:u                  #    2.57% of all branches             ( +-  0.10% )

          0.080237 +- 0.000313 seconds time elapsed  ( +-  0.39% )


# echo "" | gcc ...
$ perf stat -r 31 -- ./gen-btf.o_gnu.sh

 Performance counter stats for './gen-btf.o_gnu.sh' (31 runs):

        53,683,797      task-clock:u                     #    0.770 CPUs utilized               ( +-  0.33% )
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
             8,390      page-faults:u                    #  156.286 K/sec                       ( +-  0.01% )
        69,398,474      instructions:u                   #    1.22  insn per cycle            
                                                  #    0.17  stalled cycles per insn     ( +-  0.00% )
        56,763,954      cycles:u                         #    1.057 GHz                         ( +-  0.39% )
        12,103,546      stalled-cycles-frontend:u        #   21.32% frontend cycles idle        ( +-  0.47% )
        14,064,366      branches:u                       #  261.985 M/sec                       ( +-  0.00% )
           347,383      branch-misses:u                  #    2.47% of all branches             ( +-  0.09% )

          0.069735 +- 0.000253 seconds time elapsed  ( +-  0.36% )


> 
> Cheers,
> Nathan


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ