linux-kernel - Re: [PATCH next 06/12] tools/nolibc/printf: Add support for left alignment and %[tzLq]d"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260204153950.76e95da5@pumpkin>
Date: Wed, 4 Feb 2026 15:39:50 +0000
From: David Laight <david.laight.linux@...il.com>
To: Willy Tarreau <w@....eu>
Cc: Thomas Weißschuh <linux@...ssschuh.net>,
 linux-kernel@...r.kernel.org, Cheng Li <lechain@...il.com>
Subject: Re: [PATCH next 06/12] tools/nolibc/printf: Add support for left
 alignment and %[tzLq]d"

On Wed, 4 Feb 2026 11:40:27 +0100
Willy Tarreau <w@....eu> wrote:

> On Wed, Feb 04, 2026 at 10:17:05AM +0000, David Laight wrote:
> > > This flag will be exposed to user code, you'll have to previx it with
> > > _NOLIBC_.  
> > 
> > The lines are long enough already, something shorter would be ideal.
> > The #undef at the bottom stops it being exposed.  
> 
> No, it's not just about not being exposed, it's about *conflicting*.
> We just do not reserve macro names not starting with anything but
> _NOLIBC. If I already use __PF_BASE in my application, it will cause
> trouble here.

I'd also #undef'ed the wrong name.

> 
> > > > +			/* Flag characters */
> > > > +			for (; c >= 0x20 && c <= 0x3f; c = *fmt++) {
> > > > +				if ((__PF_FLAG(c) & (__PF_FLAG('-'))) == 0)
> > > > +					break;
> > > > +				flags |= __PF_FLAG(c);
> > > > +			}    
> > > 
> > > Honestly I don't find that it improves readability here and makes one
> > > keep doubts in background as "what if c == 'm' which will match '-'?".
> > > I think that one would better be written as the usual:
> > > 
> > > 			for (; c >= 0x20 && c <= 0x3f; c = *fmt++) {
> > > 				if (c == '-')
> > > 					break;
> > > 				flags |= __PF_FLAG(c);
> > > 			}
> > > 
> > > Or even simpler since there's already a condition in the for() loop:
> > > 
> > > 			for (; c >= 0x20 && c != '-' && c <= 0x3f; c = *fmt++)
> > > 				flags |= __PF_FLAG(c);  
> > 
> > It is all written that way for when more flags get added - look at the later
> > patches which check for any of "#-+ 0".
> > At that point the line get long and unreadable - as below :-)  
> 
> I know, I've seen them and am already bothered by this. The
> purpose of that lib has always been to focus on:
>   1) size
>   2) maintainability
>   3) portability
> 
> Performance has never been a concern. I totally agree that testing
> bitfields is often much shorter than multiple "if", though here I'm
> seeing the code significantly inflate with loops etc (which might
> remain small), but maintainability is progressively reducing. This
> code receives few contributions from many participants, and it's
> important that it's easy to understand what's being done in order
> to easily add your missing feature. I'm feeling that we're starting
> to steer away from this principle here, which is why I'm raising an
> alarm.

I've done some changes - I'll need to back-merge them into the patches.
That code now looks like:

			/* Conversion flag characters are all non-alphabetic.
			 * Create a bit-map of the valid ones for later.
			 */
			for (; ch >= 0x20 && ch <= 0x3f; ch = *fmt++) {
				if (!_NOLIBC_PF_CHAR_IS_ONE_OF(ch, ' ', '#', '+', '-', '0'))
					break;
				flags |= _NOLIBC_PF_FLAG(ch);
			}

Which is definitely more readable (until you look inside the #define).
I could even hide the range check inside the macro by using the
high bits of the first character.
That would then read:
	while (_NOLIBC_PF_CHAR_IS_ONE_OF(ch, ' ', '#', '+', '-', '0')) {
		flags |= _NOLIBC_PF_FLAG(ch);
		ch = *fmt++;
	}
or maybe:
	for (fl = 1; fl; ch = &fmt++) {
		fl = _NOLIBC_PF_CHAR_IS_ONE_OF(ch, ' ', '#', '+', '-', '0');
		flags |= fl;
	}
(especially if gcc generates a lot less code for it - I expect it will
pessimise it back to the other version.)

> 
> > I didn't want to add the flags here before supporting them later.
> > But they could all be accepted and ignored until implemented.
> > That might be better anyway.  
> 
> I've seen that in a later patch you have up to 10 values tested in
> chain.

I only counted 7 :-)
I used the __VA_ARGS__, 0, 0 trick to support up to 8.

> I just think that it could be sufficient to have a macro
> taking 10 char args, that remains easy enough to use and understand
> where it is, e.g. you pass the base then all values:
> 
>     _NOLIBC_ANY_OF(0x20, 'c', 'd', 'r', 'z', -1, -1, -1, -1 ...)
> 
> > Actually it might be worth s/c/ch/ to make the brain see the difference
> > between c and 'c' more easily.  
> 
> I'm not sure it's the only detail which is complexifying my reading :-/

It is a simple one that does make a difference.
It would have to be the first patch, which will make 'rebasing' the
patches a nightmare.

> 
> > Perhaps I'm expand the comment a bit.  
> 
> Yes, comments are cheap and welcoming to new readers.
> 
> > It is all a hint as to what is happening later on with the character tests.  
> 
> I roughly get what you're trying to do and am not contesting the goals,
> I'm however questioning the size efficiency of the resulting code (not
> fully certain it remains as small as reasonably possible), and the
> ease of maintenance.

I've been running the bloat-o-meter on pretty much every build.
The current build of __nolibc_printf (without my changes to stdlib.h) is at +90 bytes.
But I think it has lost an inlined copy of u64toa_r (about 144 bytes).
Actual size is about 1k.

The bit-flags variables definitely help over multiple comparisons.
The width/precision loop helps due to the size of va_arg()
(that is one of the bits that actually increases the size).

The code to buffer printf() (to file) actually adds more than the other changes.
But I like it because of the difference it makes to strace.

Don't even look at the mess gcc creates for the obvious:
	precision = width - (size != 0) - (size > 255);

And I can't get it to use 'bt' (bit test) or 'bts' (bit test and set).
Code like:
	if (_NOLIBC_PF_CHAR_IS_ONE_OF(ch, ' ', '#', '+', '-', '0'))
		flags |= _NOLIBC_PF_FLAG(ch);
Could be:
	mask = 0x....;
	if (bt(ch, mask))
		bts(ch, flags);
But it insists on doing:
	if ((mask >> ch) & 1)
		flags |= 1 << ch;
I can't even get it to do:
	if (mask & (1 << ch))
		flags |= 1 << ch;
(I need the condition for loop control.)
I've not tried variants of:
	bit = mask & (1 << ch);
	flags |= bit;
	if (bit)...

There is on 'bts' - in the code that adds the '#' flag into the conversion
character. Doing that improves the code far more than you might expect.
To the point where adding |= 1 << 'b' might actually reduce the code size
even though 'b' is never tested for.

...
> > I'll see if I can get a max of 2 expansions on a line.
> > Otherwise the lines get horribly long.  
> 
> OK but with todays screens it's less of a problem, and often early
> wrapping affects legibility more than long lines :-/  We don't have
> a strict 80-char limit in this project, so if you need 100 to make
> something more readable once in a while, please just do it.

I meant 'really horrible' ...
A few of the lines are in the 90s, one might be 103.
I still like 80 characters, but anal continuation lines are silly.
(I found some (ex)dayjob code written 20+ years ago with 200+
character lines, no idea how they edited it.)

> 
> But please also keep in mind my comments about the goals of size,
> maintainability and portability (e.g. don't forget to compare
> size before/after, and at least to mention when some significant
> changes have impacts in one direction or the other because that
> matters).

I have been looking at size.
I ripped some changes out because they didn't make things smaller
and made them very much less readable.

It when down quite a bit when I did the restructure and then crept
back up as I added extra features.
Overall there isn't that much difference.

One thing that would reduce overall size is making the non-trivial
functions in stdlib.h (and maybe elsewhere) noinline.
(I set that for some of my builds to stop the numeric convertion
functions being inlined - no idea why they aren't considered too big.)

For gcc adding noclone may also help (not supported by clang).
noclone can have a strange effect.
I set it on my u64toa_base() functions but only enabled it for decimal.
There were two call sites, both passed in the two constants.
But the called function ignored the passed values and reloaded both constants.
Not at all 'what the crowd intended' :-)

	David

> 
> Thanks,
> Willy