[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aYin-8KYoxP0dexW@1wt.eu>
Date: Sun, 8 Feb 2026 16:12:59 +0100
From: Willy Tarreau <w@....eu>
To: David Laight <david.laight.linux@...il.com>
Cc: Thomas Weißschuh <linux@...ssschuh.net>,
linux-kernel@...r.kernel.org, Cheng Li <lechain@...il.com>
Subject: Re: [PATCH v2 next 02/11] tools/nolibc/printf: Move snprintf length
check to callback
On Sat, Feb 07, 2026 at 11:28:25PM +0000, David Laight wrote:
> > > +
> > > + /* Flush/terminate any buffer. */
> > > + if (cb(state, NULL, 0) != 0)
> > > + return -1;
> > > +
> >
> > I suspect this hunk is in fact part of the next patch which adds
> > buffering, because I don't see what there is to flush here without
> > any buffer.
>
> Ok, it could be 'terminate' here, and I could change the line again
> in the next patch.
> If the next patch goes in (it can be missed out) then maybe a different
> comment would be better.
> Or something more 'wooly' line 'finalise the output'.
>
> > If really needed, then I think that it's about time to
> > start adding a comment about the __nolibc_printf() function to explain
> > how it's supposed to work, because callback-driven code is unreadable,
> > there are hidden expectations everywhere that are super hard to guess
> > or verify.
>
> Something has to add the terminating '\0'.
OK that's the point that wasn't obvious when reading the code.
Maybe just mention "finish and append \0" or something like this
so that it's clear it wasn't yet done.
> The callback function has the rest of the logic for writing to the buffer.
> So it saves replicating the length test (as was done before).
> It also lets vsnprintf() directly return the value returned from
> __nolibc_fprintf_cb().
> The '\0' could be added after every memcpy(), but that has hard code paths
> when the buffer length is 0 or 1.
Yeah that's what I assumed (especially since it's often more convenient
when trying to print possibly truncated strings), but I agree that it
would add complicated logic everywhere.
> It also helps if the compiler decides to inline vsnprintf.
>
> But yes, some of the motivation does come from the following patch.
> There you really do want all the write system calls to be in one function.
It's another good point that when writing to a file you don't want to
emit the zero so that can simplify the processing.
> > > -static int __nolibc_fprintf_cb(intptr_t state, const char *buf, size_t size)
> > > +static int __nolibc_fprintf_cb(void *stream, const char *buf, size_t size)
> >
> > I must confess I'm not a big fan of the void* here. I've seen that you're
> > having one state for snprintf() and another one for fprintf(), maybe they
> > could be efficiently merged into a common printf_state ? Note that I'm not
> > vetoing this, I just want to be convinced that it's the best choice, and
> > neither the code, comments nor commit messages for now suggest so.
>
> 'void *' has to be better than intptr_t.
> A common state wouldn't work - all the fields are different
> (before and after the next patch).
That's what I wanted to double-check with you.
> At least here all the functions are together and it is a straight callback.
> The usual problem with registering callback for later is the extreme
> difficulty in safely de-registering them.
Yep, and also following the code :-)
I really think that a longer comment before the main function explaining
the call sequences with states will significantly help the occasional
reader (i.e. vsnprintf() -> __nolibc_printf() -> sprintf_cb()).
> > > +static int __nolibc_sprintf_cb(void *v_state, const char *buf, size_t size)
> > > {
> > > - char **state = (char **)_state;
> > > + struct __nolibc_sprintf_cb_state *state = v_state;
> > > + char *tgt;
> > >
> > > - memcpy(*state, buf, size);
> > > - *state += size;
> > > + if (size >= state->size) {
> > > + if (state->size <= 1)
> > > + return 0;
> >
> > I failed to understand that one. Don't we want to at least write the
> > trailing zero when there's one byte left ? A short comment explaining
> > that case would help.
>
> state->size is normally greater than zero.
> So when size if zero the first condition is false, and the *tgt = 0
> line is executed later.
I meant, not what the code does but what situation this corresponds to :-)
However, I think I get it now, I think you just want to reserve the last
char for the final call. Mentioning this would help. Maybe just:
/* defer trailing \0 to final call (with size=0). Final call with
* zero-size output also gets here.
*/
> However it is valid to call vsnprintf(NULL, 0, fmt, args).
> That is the only way state->size can be zero - in that case the code
> needs to return without writing to the buffer.
Yep.
> > > + tgt = state->buf;
> > > + if (size) {
> > > + state->size -= size;
> > > + state->buf = tgt + size;
> > > + memcpy(tgt, buf, size);
> > > + } else {
> > > + /* In particular from cb(NULL, 0) at the end of __nolibc_printf(). */
> > > + *tgt = '\0';
> > > + }
> >
> > Usually, "if/else" constructs result in larger code due to jumps.
>
> Size doesn't usually matter, what makes a big difference is the branches
> being mispredicted - I think that is ~20 clocks on my zen-5 (getting close
> the the P4-netburst values).
Let me please insist a lot here: we DO NOT CARE about performance in
this lib. We do care about size and maintainability. Branch prediction
has no place at all in that code, and the entirety of the lib would
be written differently if it targetted performance. For example please
have a look at memcpy().
> I think the nolibc is optimised for size, but only within reason.
> That if/else does help document what is going on.
But I am concerned about not needlessly inflating its size. In just
a few patches the simplest hello world grew by 500 bytes. That's almost
the size of my entire very first init program where this project
started. It does matter when you start to place lots of small programs
in an initrd, as all of them are static and the size is multiplied by
the number of programs. And optimizing for branch prediction is the
last valid reason for increasing the code size here.
> > Here we certainly can unconditionally write the trailing zero.
>
> Writing the zero then overwriting it with 'no bytes' is too subtle (even for me).
> I'm sure you'd have wanted a big comment :-)
Yes definitely!
> > Bingo, we're
> > saving 9 bytes on x86_64 by moving it above. And even 17 bytes by dropping
> > the test on size and updating the state after the memcpy:
> >
> > if (size >= state->size) {
> > if (state->size <= 1)
> > return 0;
> > size = state->size - 1;
> > }
> > *state->buf = '\0';
> > memcpy(state->buf, buf, size);
> > state->buf += size;
> > state->size -= size;
>
> That starts relying on the compiler assuming that the memcpy() can't
> be writing to state->buf and state->size.
> I'm not certain it can assume that in the callback function
> (With or without 'strict aliasing').
There's nothing to assume, the code is absolutely valid. We developers
have to do the correct thing and not suspect the compiler could do bad
things in our back, but as long as we're not passing state->buf pointing
to state, there's no such risk of aliasing.
> So the explicit local for tgt helps a lot - and does no harm.
> In the past I've done:
> x->buf += size;
> memcpy(x->buf - size, src, size)
> knowing that the compiler will just use the old value.
Yeah tried it as well with no success.
> Note that something has to write the '\0' for:
> snprintf(buf, 1, fmt, args);
> and
> snprintf(buf, len, "");
> but not for
> snprintf(buf, 0, fmt, args);
> so just writing the the byte following the memcpy() isn't enough.
But I didn't change anything to that. I'm not contesting the usefulness
of the memcpy(), just asking to avoid adding "else" blocks everywhere
that add tens of bytes for each patch when we can simply perform the
else branch unconditionally before the "if".
Willy
Powered by blists - more mailing lists