[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYjCiIfTZdy3q16P@1wt.eu>
Date: Sun, 8 Feb 2026 18:06:16 +0100
From: Willy Tarreau <w@....eu>
To: David Laight <david.laight.linux@...il.com>
Cc: Thomas Weißschuh <linux@...ssschuh.net>,
linux-kernel@...r.kernel.org, Cheng Li <lechain@...il.com>
Subject: Re: [PATCH v2 next 05/11] tools/nolibc/printf: Simplify
__nolibc_printf()
On Sun, Feb 08, 2026 at 04:54:25PM +0000, David Laight wrote:
> > However here I finally found what inflates the code, when disassembling
> > the whole function: with the move of the multiple "if" statements,
> > recent compilers managed to turn it into a jump table, that considerably
> > inflates .rodata and the function as well. By passing -fno-jump-tables,
> > the size drops by ~500 bytes:
>
> That is just insane...
> That might go away with the patch that changes is all to bit-masks.
Yes, as mentioned later, it does.
> I'd done some full disassembly comparisons myself to see why changes
> made the code larger.
> I had an OPTIMIZER_HIDE_VAR(sign) in there to help, but the final
> version didn't need it.
> What this sort of code needs is something to force the compiler to
> only have one copy of something - I found a proposal for an attribute
> (or similar) for an asm block to do that, but nothing came of it.
Yeah I'm using similar hacks against the optimizer sometimes. That's
no big deal as there will always be variations between compilers, what
matters to me is that we can explain them (and indeed often when we
can we're also able to prevent the compiler from acting against us).
> >
> > text data bss dec hex filename
> > 2422 48 24 2494 9be hello-patch4
> > 1917 48 24 1989 7c5 hello-patch4-alt <---
> >
> > Building with gcc before 13 also avoids this table and explains why
> > you had better code with gcc-12.
> >
> > I also noticed that we can reduce the loop by ~40 bytes by moving the
> > literal copy after after the block that deals with format sequences,
> > because it eases comparisons, but that's no big deal for now since your
> > subsequent patches are going to change all that.
>
> Some of the early patches are carefully arranged to reduce churn
> later on.
Yes I noticed that. But the whole function is changed in the end so
we cannot avoid a number of complicated changes anyway.
> I might add the 'if (v == 0)' clause much earlier to avoid the churn
> cause by the extra indent when it is added.
>
> I'll add some extra comments as you suggested in the other patches.
Yes, that's what is the most needed (and I don't deny that there are
already quite a bunch). When optimizing code, often the code ends up
being write-only. You're doing something while having the data flow in
your head and it turns into code (like size>=256), but when you don't
know the initial assumptions and you face this, you think "WTF?". Here
the comments need to indicate the developer's design choices (e.g.
"sign can hold up to two chars starting from LSB") and some of the
assumptions that become complicated to establish due to the long list
of if/else dealing with the multiple variants of specifiers.
> I do know all about optimising for size, and for the 'worst case path'.
> The latter was some embedded hdlc code that had to finish in 196 clocks.
Rest assured that it's quite visible, we're using the same tricks to save
every possible resource (making bitmaps from words etc), it's just that
doing this requires an amazing amount of comments. I'm used to saying
that each source or object byte saved offers more budget for comments :-)
Willy
Powered by blists - more mailing lists