linux-kernel - Re: [PATCH 1/3] ring-buffer: Add uname to match criteria for persistent ring buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wg9x1Xt2cmiBbCz5XTppDQ=RNkjkmegwaF6=QghG6kBtA@mail.gmail.com>
Date: Tue, 17 Dec 2024 15:32:21 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Steven Rostedt <rostedt@...dmis.org>, Alexei Starovoitov <ast@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, 
	Masami Hiramatsu <mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>, 
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	stable@...r.kernel.org
Subject: Re: [PATCH 1/3] ring-buffer: Add uname to match criteria for
 persistent ring buffer

On Tue, 17 Dec 2024 at 14:52, Steven Rostedt <rostedt@...dmis.org> wrote:
>
> Where the buf value holds the binary storage from vbin_printf() and written
> in trace_vbprintk(). Yeah, it looks like it does depend on the arguments
> being word aligned.

So the thing is, they actually aren't word-aligned at all.

Yes, the buffer is ostensibly individual words, and the buffer size is
given in words, and 64-bit data is saved/fetched as two separate
words.

But if you look more closely, you'll see that the way the buffer is
managed is actually not as a word array at all, but using

        char *str, *end;

instead of word pointers. And the reason is because it does

        str = PTR_ALIGN(str, sizeof(type));
        ...
        str += sizeof(type);

so byte-sized data is *not* given a word, it's only given a single
char, and then if it's followed by an "int", the pointer will be
aligned at that point.

It does mean that the binary buffers are a bit denser, but since %c
and %hhd etc are VERY unusual (and often will be mixed with int-sized
data), that denser format isn't commonly an actual advantage.

And the reason I noticed this? When I was trying to clean up and
simplify the vsnprintf() code to not have so many different cases, the
*regular* number handling for char/short/int-sized cases ends up being
just one case that looks like this:

        num = get_num(va_arg(args, int), fmt.state, spec);

because char/short/int are all just "va_arg(args, int)" values.

Then the actual printout depends on that printf_spec thing (and, in my
experimental branch, that "fmt.state", but that's a local trial thing
where I split the printf_spec differently for better code generation).

So basically the core vfsprintf case doesn't need to care about
fetching char/short/int, because va_args always just formats those
kinds of arguments as int, as per the usual C implicit type expansion
rules.

But the binary printf thing then has to have three different cases,
because unlike the normal calling convention, it actually packs
char/short/int differently.  So with all those nice cleanups I tried
still look like this:

                case FORMAT_STATE_2BYTE:
                        num = get_num(get_arg(short), fmt.state, spec);
                        break;
                case FORMAT_STATE_1BYTE:
                        num = get_num(get_arg(char), fmt.state, spec);
                        break;
                default:
                        num = get_num(get_arg(int), fmt.state, spec);
                        break;

which is admittedly still a lot better than the current situation in
top-of-tree, which has separate versions for a *lot* more types.

So right now top-of-tree has 11 different enumerated cases for number printing:

        FORMAT_TYPE_LONG_LONG,
        FORMAT_TYPE_ULONG,
        FORMAT_TYPE_LONG,
        FORMAT_TYPE_UBYTE,
        FORMAT_TYPE_BYTE,
        FORMAT_TYPE_USHORT,
        FORMAT_TYPE_SHORT,
        FORMAT_TYPE_UINT,
        FORMAT_TYPE_INT,
        FORMAT_TYPE_SIZE_T,
        FORMAT_TYPE_PTRDIFF

and in my test cleanup, I've cut this down to just four cases:
FORMAT_STATE_xBYTE (x = 1, 2, 4, 8).

And the actual argument fetch for the *normal* case is actually just
two cases (because the 8-byte and the "4 bytes or less" cases are
different for va_list, but 1/2/4 bytes are just that single case).

But the binary printf argument save/fetch is not able to be cut down
to just two cases because of how it does that odd "ostensibly a word
array, but packed byte/short fields" thing.

Oh well. It's not a big deal. But I was doing this to see if I could
regularize the vsnprintf() logic and make sharing better - and then
just the binary version already causes unnecessary duplication.

If the *only* thing that accesses that word array is vbin_printf and
bstr_printf, then I could just change the packing to act like va_list
does (ie the word array would *actually* be a word array, and char and
short values would get individual words).

But at least the bpf cde seems to know about the crazy packing, and
also does that

        tmp_buf = PTR_ALIGN(tmp_buf, sizeof(u32));

in bpf_bprintf_prepare() because it knows that it's not *actually* an
array of words despite it being documented as such.

Of course, the bpf code only does the packed access thing for '%c',
and doesn't seem to know that the same is true of '%hd' and '%hhd',
presumably because nobody actually uses that.

Let's add Alexei to the participants. I think bpf too would actually
prefer that the odd char/short packing *not* be done, if only because
it clearly does the wrong thing as-is for non-%c argument (ie those
%hd/%hhd cases).

Who else accesses that odd "binary printed pseudo-word array"?

          Linus