lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+icZUXELAsCb2ya0CcC3CE5YQ_E4+Tb7K9OdTPbKSZd9JTSMw@mail.gmail.com>
Date:   Sat, 18 Jun 2022 08:13:07 +0200
From:   Sedat Dilek <sedat.dilek@...il.com>
To:     Fangrui Song <maskray@...gle.com>
Cc:     Masahiro Yamada <masahiroy@...nel.org>,
        Jiri Slaby <jslaby@...e.cz>,
        Linux Kbuild mailing list <linux-kbuild@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Michal Marek <michal.lkml@...kovi.net>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Nathan Chancellor <nathan@...nel.org>,
        Sami Tolvanen <samitolvanen@...gle.com>,
        clang-built-linux <llvm@...ts.linux.dev>
Subject: Re: [PATCH] kbuild: pass jobserver to cmd_ld_vmlinux.o

4

On Fri, Jun 17, 2022 at 10:05 PM Fangrui Song <maskray@...gle.com> wrote:
>
> On 2022-06-18, Masahiro Yamada wrote:
> >(+LLVM list, Fangrui Song)
>
> Thanks for tagging me. I'll clarify some stuff.
>
> >On Fri, Jun 17, 2022 at 7:41 PM Sedat Dilek <sedat.dilek@...il.com> wrote:
> >>
> >> On Fri, Jun 17, 2022 at 12:35 PM Sedat Dilek <sedat.dilek@...il.com> wrote:
> >> >
> >> > On Fri, Jun 17, 2022 at 12:53 AM Sedat Dilek <sedat.dilek@...il.com> wrote:
> >> > >
> >> > > On Thu, Jun 16, 2022 at 4:09 PM Sedat Dilek <sedat.dilek@...il.com> wrote:
> >> > > >
> >> > > > On Thu, Jun 16, 2022 at 12:45 PM Jiri Slaby <jslaby@...e.cz> wrote:
> >> > > > >
> >> > > > > Until the link-vmlinux.sh split (cf. the commit below), the linker was
> >> > > > > run with jobserver set in MAKEFLAGS. After the split, the command in
> >> > > > > Makefile.vmlinux_o is not prefixed by "+" anymore, so this information
> >> > > > > is lost.
> >> > > > >
> >> > > > > Restore it as linkers working in parallel (esp. the LTO ones) make a use
> >> > > > > of i
> >
> >Hi Jiri,
> >
> >Please let me clarify first.
> >
> >Here, is it OK to assume you are talking about Clang LTO
> >instead of GCC LTO because the latter is not upstreamed ?
> >
> >
> >
> >
> >
> >I tested this patch but I did not see any performance change for Clang LTO.
> >
> >
> >[1] CONFIG_CLANG_LTO_FULL
> >
> >   lld always runs sequential.
> >   It never runs in parallel even if you pass -j option to Make
>
> "lld always runs sequential" is not accurate. There are a number of
> parallel linker passes.  ld.lld --threads= defaults to
> llvm::hardware_concurrency (similar to
> https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency,
> but uses sched_getaffinity to compute the number of available cores).
>
> "lld always runs sequential" is only correct only when --threads=1 is
> specified or the system only provides one thread to the lld process.
>
> I think people may be more interested in LTO parallelism here.  Regular
> LTO (sometimes called full LTO when there is mixed-thin-and-regular LTO)
> supports limited parallelism which applies to code generation, but not
> IR-level optimization.  (IR-level optimization has many interprocedural
> optimizations passes.  Splitting will make LTO less effective. Code
> generation is per function, so parallelism does not regress
> optimization.)
>
> >
> >[2] CONFIG_CLANG_LTO_THIN
> >
> >   lld always runs in parallel even if you do not pass -j option
> >
> >   In my machine, lld always allocated 12 threads.
> >   This is irrespective of the Make parallelisms.
> >
> >
> >
> >
> >One more thing, if a program wants to participate in
> >Make's jobserver, it must parse MAKEFLAGS, and extract
> >file descriptors to be used to communicate to the jobserver.
> >
> >As a code example in the kernel tree,
> >scripts/jobserver-exec parses "MAKEFLAGS" and "--jobserver".
> >
> >
> >I grepped the lld source code, but it does not contain
> >"MAKEFLAGS" or "jobserver".
>
> >masahiro@...ar:~/ref/lld$ git remote  show origin
> >* remote origin
> >  Fetch URL: https://github.com/llvm-mirror/lld.git
> >  Push  URL: https://github.com/llvm-mirror/lld.git
> >  HEAD branch: master
> >  Remote branches:
> >    master     tracked
> >    release_36 tracked
> >    release_37 tracked
> >    release_38 tracked
> >    release_39 tracked
> >    release_40 tracked
> >    release_50 tracked
> >    release_60 tracked
> >    release_70 tracked
> >    release_80 tracked
> >    release_90 tracked
> >  Local branch configured for 'git pull':
> >    master merges with remote master
> >  Local ref configured for 'git push':
> >    master pushes to master (up to date)
> >masahiro@...ar:~/ref/lld$ git grep MAKEFLAGS
> >masahiro@...ar:~/ref/lld$ git grep jobserver
> >
> >
> >So, in my research, LLD does not seem to support the jobserver.
>
>
> Correct. lld does not support GNU make's jobserver.  On the other hand,
> I don't think the jobserver implementation supports flexible "give this
> target N hardware concurrency". A heavy link target does not necessarily
> get more resources than a quick target.
>
> If a make target knows how many hardware concurrency it gets, we can
> pass --threads= to lld. LTO easily takes 95+% link time, so LTO
> parallelism may needs a dedicated setting. lld has --thinlto-jobs=.
>

Hey Fangrui,

I played a bit with --thinlto-jobs=4 yesterday.

$ cat 0001-vmlinux-clang-thinlto-Add-thinlto-jobs-4-to-KBUILD_L.patch
>From f548c34abd49e01407de26c81f29ef89b3cae213 Mon Sep 17 00:00:00 2001
From: Sedat Dilek <sedat.dilek@...il.com>
Date: Fri, 17 Jun 2022 13:24:50 +0200
Subject: [PATCH] vmlinux: clang: thinlto: Add --thinlto-jobs=4 to
KBUILD_LDFLAGS

---
scripts/Makefile.vmlinux_o | 2 +-
scripts/link-vmlinux.sh    | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/Makefile.vmlinux_o b/scripts/Makefile.vmlinux_o
index 3c97a1564947..4c1991c91417 100644
--- a/scripts/Makefile.vmlinux_o
+++ b/scripts/Makefile.vmlinux_o
@@ -53,7 +53,7 @@ objtool_args := \

quiet_cmd_ld_vmlinux.o = LD      $@
      cmd_ld_vmlinux.o = \
-       $(LD) ${KBUILD_LDFLAGS} -r -o $@ \
+       $(LD) ${KBUILD_LDFLAGS} --thinlto-jobs=4 -r -o $@ \
       $(addprefix -T , $(initcalls-lds)) \
       --whole-archive $(KBUILD_VMLINUX_OBJS) --no-whole-archive \
       --start-group $(KBUILD_VMLINUX_LIBS) --end-group \
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index eecc1863e556..1624da57807b 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -83,7 +83,7 @@ vmlinux_link()
       else
               wl=
               ld="${LD}"
-               ldflags="${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}"
+               ldflags="${KBUILD_LDFLAGS} --thinlto-jobs=4 ${LDFLAGS_vmlinux}"
               ldlibs=
       fi

--
2.36.1

Hmm, not a significant performance gain - not measurable here.

Unsure, if passing --thinlto-jobs=4 to KBUILD_LDFLAGS in top-level
Makefile is too invasive.
( UNTESTED... )

$ cat 0001-clang-thinlto-Add-thinlto-jobs-4-to-KBUILD_LDFLAGS.patch
>From 0970e92867d11d12214ca198578364a17ef17bea Mon Sep 17 00:00:00 2001
From: Sedat Dilek <sedat.dilek@...il.com>
Date: Fri, 17 Jun 2022 13:34:49 +0200
Subject: [PATCH] clang: thinlto: Add --thinlto-jobs=4 to KBUILD_LDFLAGS

---
Makefile | 1 +
1 file changed, 1 insertion(+)

diff --git a/Makefile b/Makefile
index 1a6678d817bd..a7a7f12e2349 100644
--- a/Makefile
+++ b/Makefile
@@ -896,6 +896,7 @@ ifdef CONFIG_LTO_CLANG
ifdef CONFIG_LTO_CLANG_THIN
CC_FLAGS_LTO   := -flto=thin -fsplit-lto-unit
KBUILD_LDFLAGS += --thinlto-cache-dir=$(extmod_prefix).thinlto-cache
+KBUILD_LDFLAGS += --thinlto-jobs=4
else
CC_FLAGS_LTO   := -flto
endif
--
2.36.1

In the case of building my LLVM toolchain I have a very conservative
setting for link-jobs when building a ThinLTO + PGO (x86_64-kernel
defconfig) optimized toolchain.

$ cd /path/to/tc-build.git

$ python3 ./build-llvm.py --no-update --build-type Release -p
clang;lld -t X86;BPF --clang-vendor dileks -B
/home/dileks/src/llvm-toolchain/build -I /opt/llvm-toolchain
--check-targets clang lld --lto thin --pgo kernel-defconfig -L
/home/dileks/src/linux-kernel/git -D LLVM_PARALLEL_LINK_JOBS=1
--show-build-commands

See: -D LLVM_PARALLEL_LINK_JOBS=1

So, I guess the above patch might be counterproductive?

My kernel make-line looks like this:

/usr/bin/perf stat make V=1 -j4 LLVM=1 LLVM_IAS=1
PAHOLE=/opt/pahole/bin/pahole LOCALVERSION=-4-amd64-clang14-lto K
BUILD_BUILD_HOST=iniza KBUILD_BUILD_USER=sedat.dilek@...il.com
KBUILD_BUILD_TIMESTAMP=2022-06-17 bindeb-pkg
KDEB_PKGVERSION=5.19.0~rc2-4~bookworm+dileks1

Attaching 2 patches in case Gmail truncates the formatting and my
latest kernel-config.

Thanks.

Regards,
-Sedat-

> >
> >
> >
> >If you are talking about GCC LTO, yes, the code
> >tries to parse "--jobserver-auth=" from the MAKEFLAGS
> >environment variable.  [1]
> >
> >[1]:  https://github.com/gcc-mirror/gcc/blob/releases/gcc-12.1.0/gcc/lto-wrapper.cc#L1341
> >
> >
> >But, as you may know, GCC LTO works in a different way,
> >at least, we cannot do it before modpost.
> >
> >
> >--
> >Best Regards
> >Masahiro Yamada
> >

View attachment "0001-vmlinux-clang-thinlto-Add-thinlto-jobs-4-to-KBUILD_L.patch" of type "text/x-patch" (1253 bytes)

View attachment "0001-clang-thinlto-Add-thinlto-jobs-4-to-KBUILD_LDFLAGS.patch" of type "text/x-patch" (651 bytes)

Download attachment "config-5.19.0-rc2-4-amd64-clang14-lto" of type "application/octet-stream" (255415 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ