lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 9 Mar 2021 14:54:33 -0500 (EST)
From:   Nicolas Pitre <nico@...xnic.net>
To:     Masahiro Yamada <masahiroy@...nel.org>
cc:     Linux Kbuild mailing list <linux-kbuild@...r.kernel.org>,
        Christoph Hellwig <hch@....de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jessica Yu <jeyu@...nel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-arch <linux-arch@...r.kernel.org>
Subject: Re: [PATCH v2 3/4] kbuild: re-implement CONFIG_TRIM_UNUSED_KSYMS to
 make it work in one-pass

On Wed, 10 Mar 2021, Masahiro Yamada wrote:

> On Wed, Mar 10, 2021 at 2:36 AM Nicolas Pitre <nico@...xnic.net> wrote:
> >
> > On Wed, 10 Mar 2021, Masahiro Yamada wrote:
> >
> > > Commit a555bdd0c58c ("Kbuild: enable TRIM_UNUSED_KSYMS again, with some
> > > guarding") re-enabled this feature, but Linus is still unhappy about
> > > the build time.
> > >
> > > The reason of the slowness is the recursion - this basically works in
> > > two loops.
> > >
> > > In the first loop, Kbuild builds the entire tree based on the temporary
> > > autoksyms.h, which contains macro defines to control whether their
> > > corresponding EXPORT_SYMBOL() is enabled or not, and also gathers all
> > > symbols required by modules. After the tree traverse, Kbuild updates
> > > autoksyms.h and triggers the second loop to rebuild source files whose
> > > EXPORT_SYMBOL() needs flipping.
> > >
> > > This commit re-implements CONFIG_TRIM_UNUSED_KSYMS to make it work in
> > > one pass. In the new design, unneeded EXPORT_SYMBOL() instances are
> > > trimmed by the linker instead of the preprocessor.
> > >
> > > After the tree traverse, a linker script snippet <generated/keep-ksyms.h>
> > > is generated. It feeds the list of necessary sections to vmlinus.lds.S
> > > and modules.lds.S. The other sections fall into /DISCARD/.
> > >
> > > Signed-off-by: Masahiro Yamada <masahiroy@...nel.org>
> >
> > I'm not sure I do understand every detail here, especially since it is
> > so far away from the version that I originally contributed. But the
> > concept looks good.
> >
> > I still think that there is no way around a recursive approach to get
> > the maximum effect with LTO, but given that true LTO still isn't applied
> > to mainline after all those years, the recursive approach brings
> > nothing. Maybe that could be revisited if true LTO ever makes it into
> > mainline, and the desire to reduce the binary size is still relevant
> > enough to justify it.
> 
> Hmm, I am confused.
> 
> Does this patch change the behavior in the
> combination with the "true LTO"?
> 
> Please let me borrow this sentence from your article:
> "But what LTO does is more like getting rid of branches that simply
> float in the air without being connected to anything or which have
> become loose due to optimization."
> (https://lwn.net/Articles/746780/)
> 
> This patch throws unneeded EXPORT_SYMBOL metadata
> into the /DISCARD/ section of the linker script.
> 
> The approach is different (preprocessor vs linker), but
> we will still get the same result; the unneeded
> EXPORT_SYMBOLs are disconnected from the main trunk.
> 
> Then, the true LTO will remove branches floating in the air,
> right?
> 
> So, what will be lost by this patch?

Let's say you have this in module_foo:

int foo(int x)
{
	return 2 + bar(x);
}
EXPORT_SYMBOL(foo);

And module_bar:

int bar(int y)
{
	return 3 * baz(y);
}
EXPORT_SYMBOL(bar);

And this in the main kernel image:

int baz(int z)
{
	return plonk(z);
}
EXPORT_SYMBOLbaz);

Now we build the kernel and modules. Then we realize that nothing 
references symbol "foo". We can trim the "foo" export. But it would be 
necessary to recompile module_foo with LTO (especially if there is 
some other code in that module) to realize that nothing 
references foo() any longer and optimize away the reference to bar(). 
With another round, we now realize that the "bar" export is no longer 
necessary. But that will require another compile round to optimize away 
the reference to baz(). And then a final compilation round with 
LTO to possibly optimize plonk() out of the kernel.

I don't see how you can propagate all this chain reaction with only one 
pass.


Nicolas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ