lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKwvOdn3uXniVedgtpD8QFAd-hdVuVjGPa4-n0h64PTxT4XhWg@mail.gmail.com>
Date:   Fri, 30 Apr 2021 17:19:57 -0700
From:   Nick Desaulniers <ndesaulniers@...gle.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Masahiro Yamada <masahiroy@...nel.org>
Cc:     Nathan Chancellor <nathan@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        clang-built-linux <clang-built-linux@...glegroups.com>
Subject: Re: Very slow clang kernel config ..

On Thu, Apr 29, 2021 at 7:22 PM Nick Desaulniers
<ndesaulniers@...gle.com> wrote:
>
> On Thu, Apr 29, 2021 at 5:19 PM Nick Desaulniers
> <ndesaulniers@...gle.com> wrote:
> >
> > On Thu, Apr 29, 2021 at 2:53 PM Linus Torvalds
> > <torvalds@...ux-foundation.org> wrote:
> > >
> > > I haven't looked into why this is so slow with clang, but it really is
> > > painfully slow:
> > >
> > >    time make CC=clang allmodconfig
> > >    real 0m2.667s
> > >
> > > vs the gcc case:
> > >
> > >     time make CC=gcc allmodconfig
> > >     real 0m0.903s
> >
> > Can
> > you provide info about your clang build such as the version string,
> > and whether this was built locally perhaps?
>
> d'oh it was below.
>
> > > This is on my F34 machine:
> > >
> > >      clang version 12.0.0 (Fedora 12.0.0-0.3.rc1.fc34)

A quick:
$ perf record -e cycles:pp --call-graph lbr make LLVM=1 LLVM_IAS=1
-j72 allmodconfig
$ perf report --no-children --sort=dso,symbol
shows:
     2.35%  [unknown]                [k] 0xffffffffabc00fc7
+    2.29%  libc-2.31.so             [.] _int_malloc
     1.24%  libc-2.31.so             [.] _int_free
+    1.23%  ld-2.31.so               [.] do_lookup_x
+    1.14%  libc-2.31.so             [.] __strlen_avx2
+    1.06%  libc-2.31.so             [.] malloc
+    1.03%  clang-13                 [.] llvm::StringMapImpl::LookupBucketFor
     1.01%  libc-2.31.so             [.] __memmove_avx_unaligned_erms
+    0.76%  conf                     [.] yylex
+    0.68%  clang-13                 [.] llvm::Instruction::getNumSuccessors
+    0.63%  libbfd-2.35.2-system.so  [.] bfd_hash_lookup
+    0.63%  clang-13                 [.] llvm::PMDataManager::findAnalysisPass
+    0.63%  ld-2.31.so               [.] _dl_lookup_symbol_x
     0.62%  libc-2.31.so             [.] __memcmp_avx2_movbe
     0.60%  libc-2.31.so             [.] __strcmp_avx2
+    0.56%  clang-13                 [.] llvm::ValueHandleBase::AddToUseList
+    0.56%  clang-13                 [.]
llvm::operator==<llvm::DenseMap<llvm::BasicBlock const*, unsigned int,
llvm::DenseMapInfo<llvm::BasicBlock const*>, llvm::detail::Dense
     0.53%  clang-13                 [.]
llvm::SmallPtrSetImplBase::insert_imp_big

(yes, I know about kptr_restrict)(sorry if there's a better way to
share such perf data; don't you need to share perf.data and the same
binary, IIRC?)

The string map lookups look expected; the compiler flags are one very
large string map; though we've identified previously perhaps hashing
could be sped up.

llvm::Instruction::getNumSuccessors looks unexpectedly like codegen,
but this was a trace of `allmodconfig`; I wouldn't be surprised if
this is LLVM=1 setting HOSTCC=clang; might be good to try to isolate
those out.

Some other questions that came to mind thinking about this overnight:
- is Kbuild/make doing more work than is necessary when building with
clang (beyond perhaps a few more cc-option checks)? I don't think perf
is the right tool for profiling GNU make. V=1 to make hides a lot of
the work macros like cc-option are doing.
- is clang doing more work than necessary for just checking support of
command line flags? Probably. I'm not sure that has been optimized
before, but if we pursue that but the slowdown was more so the
previous point, that would potentially be a waste of time.
-- 
Thanks,
~Nick Desaulniers

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ