[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wg9x1Xt2cmiBbCz5XTppDQ=RNkjkmegwaF6=QghG6kBtA@mail.gmail.com>
Date: Tue, 17 Dec 2024 15:32:21 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Steven Rostedt <rostedt@...dmis.org>, Alexei Starovoitov <ast@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
Masami Hiramatsu <mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Andrew Morton <akpm@...ux-foundation.org>,
stable@...r.kernel.org
Subject: Re: [PATCH 1/3] ring-buffer: Add uname to match criteria for
persistent ring buffer
On Tue, 17 Dec 2024 at 14:52, Steven Rostedt <rostedt@...dmis.org> wrote:
>
> Where the buf value holds the binary storage from vbin_printf() and written
> in trace_vbprintk(). Yeah, it looks like it does depend on the arguments
> being word aligned.
So the thing is, they actually aren't word-aligned at all.
Yes, the buffer is ostensibly individual words, and the buffer size is
given in words, and 64-bit data is saved/fetched as two separate
words.
But if you look more closely, you'll see that the way the buffer is
managed is actually not as a word array at all, but using
char *str, *end;
instead of word pointers. And the reason is because it does
str = PTR_ALIGN(str, sizeof(type));
...
str += sizeof(type);
so byte-sized data is *not* given a word, it's only given a single
char, and then if it's followed by an "int", the pointer will be
aligned at that point.
It does mean that the binary buffers are a bit denser, but since %c
and %hhd etc are VERY unusual (and often will be mixed with int-sized
data), that denser format isn't commonly an actual advantage.
And the reason I noticed this? When I was trying to clean up and
simplify the vsnprintf() code to not have so many different cases, the
*regular* number handling for char/short/int-sized cases ends up being
just one case that looks like this:
num = get_num(va_arg(args, int), fmt.state, spec);
because char/short/int are all just "va_arg(args, int)" values.
Then the actual printout depends on that printf_spec thing (and, in my
experimental branch, that "fmt.state", but that's a local trial thing
where I split the printf_spec differently for better code generation).
So basically the core vfsprintf case doesn't need to care about
fetching char/short/int, because va_args always just formats those
kinds of arguments as int, as per the usual C implicit type expansion
rules.
But the binary printf thing then has to have three different cases,
because unlike the normal calling convention, it actually packs
char/short/int differently. So with all those nice cleanups I tried
still look like this:
case FORMAT_STATE_2BYTE:
num = get_num(get_arg(short), fmt.state, spec);
break;
case FORMAT_STATE_1BYTE:
num = get_num(get_arg(char), fmt.state, spec);
break;
default:
num = get_num(get_arg(int), fmt.state, spec);
break;
which is admittedly still a lot better than the current situation in
top-of-tree, which has separate versions for a *lot* more types.
So right now top-of-tree has 11 different enumerated cases for number printing:
FORMAT_TYPE_LONG_LONG,
FORMAT_TYPE_ULONG,
FORMAT_TYPE_LONG,
FORMAT_TYPE_UBYTE,
FORMAT_TYPE_BYTE,
FORMAT_TYPE_USHORT,
FORMAT_TYPE_SHORT,
FORMAT_TYPE_UINT,
FORMAT_TYPE_INT,
FORMAT_TYPE_SIZE_T,
FORMAT_TYPE_PTRDIFF
and in my test cleanup, I've cut this down to just four cases:
FORMAT_STATE_xBYTE (x = 1, 2, 4, 8).
And the actual argument fetch for the *normal* case is actually just
two cases (because the 8-byte and the "4 bytes or less" cases are
different for va_list, but 1/2/4 bytes are just that single case).
But the binary printf argument save/fetch is not able to be cut down
to just two cases because of how it does that odd "ostensibly a word
array, but packed byte/short fields" thing.
Oh well. It's not a big deal. But I was doing this to see if I could
regularize the vsnprintf() logic and make sharing better - and then
just the binary version already causes unnecessary duplication.
If the *only* thing that accesses that word array is vbin_printf and
bstr_printf, then I could just change the packing to act like va_list
does (ie the word array would *actually* be a word array, and char and
short values would get individual words).
But at least the bpf cde seems to know about the crazy packing, and
also does that
tmp_buf = PTR_ALIGN(tmp_buf, sizeof(u32));
in bpf_bprintf_prepare() because it knows that it's not *actually* an
array of words despite it being documented as such.
Of course, the bpf code only does the packed access thing for '%c',
and doesn't seem to know that the same is true of '%hd' and '%hhd',
presumably because nobody actually uses that.
Let's add Alexei to the participants. I think bpf too would actually
prefer that the odd char/short packing *not* be done, if only because
it clearly does the wrong thing as-is for non-%c argument (ie those
%hd/%hhd cases).
Who else accesses that odd "binary printed pseudo-word array"?
Linus
Powered by blists - more mailing lists