linux-kernel - Re: [PATCH 01/13] objtool: Rewrite hashtable sizing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YMJWmzXgSipOqXAf@DESKTOP-1V8MEUQ.localdomain>
Date:   Thu, 10 Jun 2021 11:14:51 -0700
From:   Nathan Chancellor <nathan@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     x86@...nel.org, jpoimboe@...hat.com, jbaron@...mai.com,
        rostedt@...dmis.org, ardb@...nel.org, linux-kernel@...r.kernel.org,
        samitolvanen@...gle.com, ndesaulniers@...gle.com,
        clang-built-linux@...glegroups.com
Subject: Re: [PATCH 01/13] objtool: Rewrite hashtable sizing

Hi Peter,

On Thu, May 06, 2021 at 09:33:53PM +0200, Peter Zijlstra wrote:
> Currently objtool has 5 hashtables and sizes them 16 or 20 bits
> depending on the --vmlinux argument.
> 
> However, a single side doesn't really work well for the 5 tables,
> which among them, cover 3 different uses. Also, while vmlinux is
> larger, there is still a very wide difference between a defconfig and
> allyesconfig build, which again isn't optimally covered by a single
> size.
> 
> Another aspect is the cost of elf_hash_init(), which for large tables
> dominates the runtime for small input files. It turns out that all it
> does it assign NULL, something that is required when using malloc().
> However, when we allocate memory using mmap(), we're guaranteed to get
> zero filled pages.
> 
> Therefore, rewrite the whole thing to:
> 
>  1) use more dynamic sized tables, depending on the input file,
>  2) avoid the need for elf_hash_init() entirely by using mmap().
> 
> This speeds up a regular kernel build (100s to 98s for
> x86_64-defconfig), and potentially dramatically speeds up vmlinux
> processing.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>

This patch as commit 25cf0d8aa2a3 ("objtool: Rewrite hashtable sizing")
in -tip causes a massive compile time regression with allmodconfig +
ThinLTO.

At v5.13-rc1, the performance penalty is only about 23%, as measured with
hyperfine for two runs [1]:

Benchmark #1: allmodconfig
  Time (mean ± σ):     625.173 s ±  2.198 s    [User: 35120.895 s, System: 2176.868 s]
  Range (min … max):   623.619 s … 626.727 s    2 runs

Benchmark #2: allmodconfig with ThinLTO
  Time (mean ± σ):     771.034 s ±  0.369 s    [User: 39706.084 s, System: 2326.166 s]
  Range (min … max):   770.773 s … 771.295 s    2 runs

Summary
  'allmodconfig' ran
    1.23 ± 0.00 times faster than 'allmodconfig with ThinLTO'

However, at 25cf0d8aa2a3, it is almost 150% on a 64-core server.

Benchmark #1: allmodconfig
  Time (mean ± σ):     624.759 s ±  2.153 s    [User: 35114.379 s, System: 2145.456 s]
  Range (min … max):   623.237 s … 626.281 s    2 runs

Benchmark #2: allmodconfig with ThinLTO
  Time (mean ± σ):     1555.377 s ± 12.806 s    [User: 40558.463 s, System: 2310.139 s]
  Range (min … max):   1546.321 s … 1564.432 s    2 runs

Summary
  'allmodconfig' ran
    2.49 ± 0.02 times faster than 'allmodconfig with ThinLTO'

Adding Sami because I am not sure why this patch would have much of an impact
in relation to LTO. https://git.kernel.org/tip/25cf0d8aa2a3 is the patch in
question.

If I can provide any further information or help debug, please let me know.

If you are interested in reproducing this locally, you will need a
fairly recent LLVM stack (I used the stable release/12.x branch) and to
cherry-pick commit 976aac5f8829 ("kcsan: Fix debugfs initcall return
type") to fix an unrelated build failure. My script [2] can build a
self-contained toolchain fairly quickly if you cannot get one from your
package manager. A command like below will speed up the build a bit:

$ ./build-llvm.py \
    --branch "release/12.x" \
    --build-stage1-only \
    --install-stage1-only \
    --projects "clang;lld" \
    --targets X86

After adding the "install/bin" directory to PATH:

$ echo "CONFIG_GCOV_KERNEL=n
CONFIG_KASAN=n
CONFIG_LTO_CLANG_THIN=y" >allmod.config

$ make -skj"$(nproc)" LLVM=1 LLVM_IAS=1 allmodconfig all

[1]: https://github.com/sharkdp/hyperfine
[2]: https://github.com/ClangBuiltLinux/tc-build

Cheers,
Nathan