[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mhng-6c34765c-126d-4e6c-8904-e002d49a4336@palmer-ri-x1c9a>
Date: Thu, 22 Jun 2023 15:16:51 -0700 (PDT)
From: Palmer Dabbelt <palmer@...belt.com>
To: nathan@...nel.org
CC: bjorn@...nel.org, ndesaulniers@...gle.com,
Conor Dooley <conor@...nel.org>, jszhang@...nel.org,
llvm@...ts.linux.dev, Paul Walmsley <paul.walmsley@...ive.com>,
aou@...s.berkeley.edu, Arnd Bergmann <arnd@...db.de>,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-arch@...r.kernel.org
Subject: Re: [PATCH v2 0/4] riscv: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
On Thu, 22 Jun 2023 14:53:27 PDT (-0700), nathan@...nel.org wrote:
> On Wed, Jun 21, 2023 at 11:19:31AM -0700, Palmer Dabbelt wrote:
>> On Wed, 21 Jun 2023 10:51:15 PDT (-0700), bjorn@...nel.org wrote:
>> > Conor Dooley <conor@...nel.org> writes:
>> >
>> > [...]
>> >
>> > > > So I'm no longer actually sure there's a hang, just something
>> > > > slow. That's even more of a grey area, but I think it's sane to
>> > > > call a 1-hour link time a regression -- unless it's expected
>> > > > that this is just very slow to link?
>> > >
>> > > I dunno, if it was only a thing for allyesconfig, then whatever - but
>> > > it's gonna significantly increase build times for any large kernels if LLD
>> > > is this much slower than LD. Regression in my book.
>> > >
>> > > I'm gonna go and experiment with mixed toolchain builds, I'll report
>> > > back..
>> >
>> > I took palmer/for-next (1bd2963b2175 ("Merge patch series "riscv: enable
>> > HAVE_LD_DEAD_CODE_DATA_ELIMINATION"")) for a tuxmake build with llvm-16:
>> >
>> > | ~/src/tuxmake/run -v --wrapper ccache --target-arch riscv \
>> > | --toolchain=llvm-16 --runtime docker --directory . -k \
>> > | allyesconfig
>> >
>> > Took forever, but passed after 2.5h.
>>
>> Thanks. I just re-ran mine 17/trunk LLD under time (rather that just
>> checking top sometimes), it's at 1.5h but even that seems quite long.
>>
>> I guess this is sort of up to the LLVM folks: if it's expected that DCE
>> takes a very long time to link then I'm not opposed to allowing it, but if
>> this is probably a bug in LLD then it seems best to turn it off until we
>> sort things out over there.
>>
>> I think maybe Nick or Nathan is the best bet to know?
>
> I can confirm a regression with allyesconfig but not allmodconfig using
> LLVM 16.0.6 on my 80-core Ampere Altra system.
>
> allmodconfig: 8m 4s
> allmodconfig + CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n: 7m 4s
> allyesconfig: 1h 58m 30s
> allyesconfig + CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=n: 12m 41s
Are those backwards? I'm getting super slow builds after merging the
patch set, not before -- though apologize in advance if I'm reading it
wrong, I'm well on my way to falling asleep already ;)
> I am sure there is something that ld.lld can do better, given GNU ld
> does not have any problems as earlier established, so that should
> definitely be explored further. I see Nick already had a response about
> writing up a report (I wrote most of this before that email so I am
> still sending this one).
>
> However, allyesconfig is pretty special and not really indicative of a
> "real world" kernel build in my opinion (which will either be a fully
> modular kernel to allow use on a wide range of hardware or a monolithic
> kernel with just the drivers needed for a specific platform, which will
> be much smaller than allyesconfig); it has given us problems with large
> kernels before on other architectures.
I totally agree that allyesconfig is an oddity, but it's something that
does get regularly build tested so a big build time hit there is going
to cause trouble -- maybe not for users, but it'll be a problem for
maintainers and that's way more likely to get me yelled at ;)
> CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is already marked with 'depends on
> EXPERT' and its help text mentions its perils, so it does not seem
> unreasonable to me to add an additional dependency on !COMPILE_TEST so
> that allmodconfig and allyesconfig cannot flip this on, something like
> the following perhaps?
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 32c24950c4ce..25434cbd2a6e 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1388,7 +1388,7 @@ config HAVE_LD_DEAD_CODE_DATA_ELIMINATION
> config LD_DEAD_CODE_DATA_ELIMINATION
> bool "Dead code and data elimination (EXPERIMENTAL)"
> depends on HAVE_LD_DEAD_CODE_DATA_ELIMINATION
> - depends on EXPERT
> + depends on EXPERT && !COMPILE_TEST
> depends on $(cc-option,-ffunction-sections -fdata-sections)
> depends on $(ld-option,--gc-sections)
> help
>
> If applying that dependency to all architectures is too much, the
> selection in arch/riscv/Kconfig could be gated on the same condition.
Is the regression for all ports, or just RISC-V? I'm fine gating this
with some sort of Kconfig flag, if it's just impacting RISC-V then it
seems sane to keep it over here.
> Cheers,
> Nathan
Powered by blists - more mailing lists