[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2o2rpn97-79nq-p7s2-nq5-8p83391473r@syhkavp.arg>
Date: Tue, 9 Mar 2021 14:54:33 -0500 (EST)
From: Nicolas Pitre <nico@...xnic.net>
To: Masahiro Yamada <masahiroy@...nel.org>
cc: Linux Kbuild mailing list <linux-kbuild@...r.kernel.org>,
Christoph Hellwig <hch@....de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jessica Yu <jeyu@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-arch <linux-arch@...r.kernel.org>
Subject: Re: [PATCH v2 3/4] kbuild: re-implement CONFIG_TRIM_UNUSED_KSYMS to
make it work in one-pass
On Wed, 10 Mar 2021, Masahiro Yamada wrote:
> On Wed, Mar 10, 2021 at 2:36 AM Nicolas Pitre <nico@...xnic.net> wrote:
> >
> > On Wed, 10 Mar 2021, Masahiro Yamada wrote:
> >
> > > Commit a555bdd0c58c ("Kbuild: enable TRIM_UNUSED_KSYMS again, with some
> > > guarding") re-enabled this feature, but Linus is still unhappy about
> > > the build time.
> > >
> > > The reason of the slowness is the recursion - this basically works in
> > > two loops.
> > >
> > > In the first loop, Kbuild builds the entire tree based on the temporary
> > > autoksyms.h, which contains macro defines to control whether their
> > > corresponding EXPORT_SYMBOL() is enabled or not, and also gathers all
> > > symbols required by modules. After the tree traverse, Kbuild updates
> > > autoksyms.h and triggers the second loop to rebuild source files whose
> > > EXPORT_SYMBOL() needs flipping.
> > >
> > > This commit re-implements CONFIG_TRIM_UNUSED_KSYMS to make it work in
> > > one pass. In the new design, unneeded EXPORT_SYMBOL() instances are
> > > trimmed by the linker instead of the preprocessor.
> > >
> > > After the tree traverse, a linker script snippet <generated/keep-ksyms.h>
> > > is generated. It feeds the list of necessary sections to vmlinus.lds.S
> > > and modules.lds.S. The other sections fall into /DISCARD/.
> > >
> > > Signed-off-by: Masahiro Yamada <masahiroy@...nel.org>
> >
> > I'm not sure I do understand every detail here, especially since it is
> > so far away from the version that I originally contributed. But the
> > concept looks good.
> >
> > I still think that there is no way around a recursive approach to get
> > the maximum effect with LTO, but given that true LTO still isn't applied
> > to mainline after all those years, the recursive approach brings
> > nothing. Maybe that could be revisited if true LTO ever makes it into
> > mainline, and the desire to reduce the binary size is still relevant
> > enough to justify it.
>
> Hmm, I am confused.
>
> Does this patch change the behavior in the
> combination with the "true LTO"?
>
> Please let me borrow this sentence from your article:
> "But what LTO does is more like getting rid of branches that simply
> float in the air without being connected to anything or which have
> become loose due to optimization."
> (https://lwn.net/Articles/746780/)
>
> This patch throws unneeded EXPORT_SYMBOL metadata
> into the /DISCARD/ section of the linker script.
>
> The approach is different (preprocessor vs linker), but
> we will still get the same result; the unneeded
> EXPORT_SYMBOLs are disconnected from the main trunk.
>
> Then, the true LTO will remove branches floating in the air,
> right?
>
> So, what will be lost by this patch?
Let's say you have this in module_foo:
int foo(int x)
{
return 2 + bar(x);
}
EXPORT_SYMBOL(foo);
And module_bar:
int bar(int y)
{
return 3 * baz(y);
}
EXPORT_SYMBOL(bar);
And this in the main kernel image:
int baz(int z)
{
return plonk(z);
}
EXPORT_SYMBOLbaz);
Now we build the kernel and modules. Then we realize that nothing
references symbol "foo". We can trim the "foo" export. But it would be
necessary to recompile module_foo with LTO (especially if there is
some other code in that module) to realize that nothing
references foo() any longer and optimize away the reference to bar().
With another round, we now realize that the "bar" export is no longer
necessary. But that will require another compile round to optimize away
the reference to baz(). And then a final compilation round with
LTO to possibly optimize plonk() out of the kernel.
I don't see how you can propagate all this chain reaction with only one
pass.
Nicolas
Powered by blists - more mailing lists