linux-kernel - Re: [PATCH] kbuild: move -pipe to global KBUILD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200222080140.GA40311@ubuntu-m2-xlarge-x86>
Date:   Sat, 22 Feb 2020 01:01:40 -0700
From:   Nathan Chancellor <natechancellor@...il.com>
To:     "Alex Xu (Hello71)" <alex_y_xu@...oo.ca>
Cc:     Russell King <linux@...linux.org.uk>, linux-kbuild@...r.kernel.org,
        linux-kernel@...r.kernel.org, masahiroy@...nel.org,
        michal.lkml@...kovi.net
Subject: Re: [PATCH] kbuild: move -pipe to global KBUILD_CFLAGS

On Fri, Feb 21, 2020 at 11:01:24PM -0500, Alex Xu (Hello71) wrote:
> Excerpts from Nathan Chancellor's message of February 21, 2020 9:16 pm:
> > Hi Alex,
> > 
> > On Fri, Feb 21, 2020 at 07:38:20PM -0500, Alex Xu (Hello71) wrote:
> >> -pipe reduces unnecessary disk wear for systems where /tmp is not a
> >> tmpfs, slightly increases compilation speed, and avoids leaving behind
> >> files when gcc crashes.
> >> 
> >> According to the gcc manual, "this fails to work on some systems where
> >> the assembler is unable to read from a pipe; but the GNU assembler has
> >> no trouble". We already require GNU ld on all platforms, so this is not
> >> an additional dependency. LLVM as also supports pipes.
> >> 
> >> -pipe has always been used for most architectures, this change
> >> standardizes it globally. Most notably, arm, arm64, riscv, and x86 are
> >> affected.
> >> 
> >> Signed-off-by: Alex Xu (Hello71) <alex_y_xu@...oo.ca>
> > 
> > Do you have any numbers to show this is actually beneficial from a
> > compilation time perspective? I ask because I saw an improvement in
> > compilation time when removing -pipe from x86's KBUILD_CFLAGS in
> > commit 437e88ab8f9e ("x86/build: Remove -pipe from KBUILD_CFLAGS").
> > 
> > For what it's worth, clang ignores -pipe so this does not actually
> > matter for its integrated assembler.
> > 
> > That type of change could have been a fluke but I guarantee people
> > will care more about any change in compilation time than any of the
> > other things that you mention so it might be wise to check on major
> > architectures to make sure that it doesn't hurt.
> > 
> > Cheers,
> > Nathan
> > 
> 
> Sorry, I should've checked the performance first. I have now run:
> 
> cd /tmp/linux # previously: make O=/tmp/linux
> export MAKEFLAGS=12 # Ryzen 1600, 6 cores, 12 threads
> make allnoconfig
> for i in {1..10}; do
>     make clean >/dev/null
>     time make XPIPE=-pipe >/dev/null
>     make clean >/dev/null
>     time make >/dev/null
> done
> 
> after patching -pipe to $(XPIPE) in Makefile.
> 
> Results (without ld warnings):
> 
> make > /dev/null  130.54s user 10.41s system 969% cpu 14.537 total
> make XPIPE=-pipe > /dev/null  129.83s user 9.95s system 977% cpu 14.296 total
> make > /dev/null  129.73s user 10.28s system 966% cpu 14.493 total
> make XPIPE=-pipe > /dev/null  130.04s user 10.63s system 986% cpu 14.252 total
> make > /dev/null  129.53s user 10.28s system 972% cpu 14.379 total
> make XPIPE=-pipe > /dev/null  130.29s user 10.17s system 983% cpu 14.288 total
> make > /dev/null  130.19s user 10.52s system 968% cpu 14.530 total
> make XPIPE=-pipe > /dev/null  129.90s user 10.47s system 978% cpu 14.343 total
> make > /dev/null  129.50s user 10.81s system 959% cpu 14.620 total
> make XPIPE=-pipe > /dev/null  130.37s user 10.60s system 975% cpu 14.446 total
> make > /dev/null  129.63s user 10.18s system 972% cpu 14.374 total
> make XPIPE=-pipe > /dev/null  131.29s user 9.92s system 1016% cpu 13.899 total
> make > /dev/null  129.96s user 10.39s system 961% cpu 14.596 total
> make XPIPE=-pipe > /dev/null  131.63s user 10.16s system 1011% cpu 14.015 total
> make > /dev/null  129.33s user 10.54s system 970% cpu 14.405 total
> make XPIPE=-pipe > /dev/null  129.70s user 10.40s system 976% cpu 14.349 total
> make > /dev/null  129.53s user 10.25s system 964% cpu 14.494 total
> make XPIPE=-pipe > /dev/null  130.38s user 10.62s system 973% cpu 14.479 total
> make > /dev/null  130.73s user 10.08s system 957% cpu 14.704 total
> make XPIPE=-pipe > /dev/null  130.43s user 10.62s system 985% cpu 14.309 total
> make > /dev/null  130.54s user 10.41s system 969% cpu 14.537 total
> 
> There is a fair bit of variance, probably due to cpufreq, schedutil, CPU 
> temperature, CPU scheduler, motherboard power delivery, etc. But, I 
> think it can be clearly seen that -pipe is, on average, about 0.1 to 0.2 
> seconds faster.
> 
> I also tried "make defconfig":
> 
> make > /dev/null  1238.26s user 102.39s system 1095% cpu 2:02.33 total
> make XPIPE=-pipe > /dev/null  1231.33s user 102.52s system 1081% cpu 2:03.29 total
> make > /dev/null  1232.92s user 102.07s system 1096% cpu 2:01.71 total
> make XPIPE=-pipe > /dev/null  1239.59s user 102.30s system 1096% cpu 2:02.39 total
> make > /dev/null  1229.81s user 101.72s system 1093% cpu 2:01.74 total
> make XPIPE=-pipe > /dev/null  1234.64s user 101.30s system 1098% cpu 2:01.64 total
> make > /dev/null  1228.50s user 104.39s system 1093% cpu 2:01.91 total
> make XPIPE=-pipe > /dev/null  1238.78s user 102.57s system 1099% cpu 2:01.99 total
> make > /dev/null  1238.26s user 102.39s system 1095% cpu 2:02.33 total
> 
> I stopped after this because I needed to use the machine for other 
> tasks. The results are less clear, but I think there's not a big 
> difference one way or another, at least on my machine.
> 
> CPU: Ryzen 1600, overclocked to ~3.8 GHz
> RAM: Corsair Vengeance, overclocked to ~3300 MHz, forgot timings
> Motherboard: ASRock B450 Pro4
> 
> I would speculate that the recent pipe changes have caused a change in 
> the relative speed compared to 2018. I am using 5.6.0-rc2 with -O3 
> -march=native patches.
> 
> Regards,
> Alex.

I used hyperfine [1] to run a quick benchmark with a freshly built
GCC 9.2.0 for x86 and aarch64 and here are the results:

$ hyperfine -w 1 -r 25 \
            -p 'rm -rf out.x86_64' \
            'make -j$(nproc) -s CROSS_COMPILE=x86_64-linux- O=out.x86_64 defconfig all' \
            'make -j$(nproc) -s CROSS_COMPILE=x86_64-linux- KCFLAGS=-pipe O=out.x86_64 defconfig all'

Benchmark #1: make -j$(nproc) -s CROSS_COMPILE=x86_64-linux- O=out.x86_64 defconfig all
  Time (mean ± σ):     68.535 s ±  0.275 s    [User: 2241.681 s, System: 185.454 s]
  Range (min … max):   67.855 s … 68.953 s    25 runs

Benchmark #2: make -j$(nproc) -s CROSS_COMPILE=x86_64-linux- KCFLAGS=-pipe O=out.x86_64 defconfig all
  Time (mean ± σ):     68.922 s ±  0.095 s    [User: 2264.168 s, System: 190.297 s]
  Range (min … max):   68.781 s … 69.126 s    25 runs

Summary
  'make -j$(nproc) -s CROSS_COMPILE=x86_64-linux- O=out.x86_64 defconfig all' ran
    1.01 ± 0.00 times faster than 'make -j$(nproc) -s CROSS_COMPILE=x86_64-linux- KCFLAGS=-pipe O=out.x86_64 defconfig all'

$ hyperfine -w 1 -r 25 \
            -p 'rm -rf out.aarch64' \
            'make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=aarch64-linux- O=out.aarch64 defconfig all' \
            'make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=aarch64-linux- KCFLAGS=-pipe O=out.aarch64 defconfig all'

Benchmark #1: make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=aarch64-linux- O=out.aarch64 defconfig all
  Time (mean ± σ):     166.732 s ±  0.594 s    [User: 5654.780 s, System: 475.493 s]
  Range (min … max):   165.873 s … 167.859 s    25 runs

Benchmark #2: make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=aarch64-linux- KCFLAGS=-pipe O=out.aarch64 defconfig all
  Time (mean ± σ):     168.047 s ±  0.428 s    [User: 5734.031 s, System: 488.392 s]
  Range (min … max):   167.328 s … 168.959 s    25 runs

Summary
  'make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=aarch64-linux- O=out.aarch64 defconfig all' ran
    1.01 ± 0.00 times faster than 'make -j$(nproc) -s ARCH=arm64 CROSS_COMPILE=aarch64-linux- KCFLAGS=-pipe O=out.aarch64 defconfig all'

In both cases it seems like performance regresses (by 1% but still) but
maybe it is my machine, even though this benchmark was done on a
different machine than the one from my commit back in 2018.

I am not sure I would write off these results, since I did the benchmark
25 times on each one back to back, eliminating most of the variance that
you described.

[1]: https://github.com/sharkdp/hyperfine

Cheers,
Nathan