[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bd53d894b7bb0fcaa520282a04a6487828282695.camel@perches.com>
Date: Sun, 07 Feb 2021 06:58:28 -0800
From: Joe Perches <joe@...ches.com>
To: Chris Down <chris@...isdown.name>
Cc: Petr Mladek <pmladek@...e.com>, linux-kernel@...r.kernel.org,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
John Ogness <john.ogness@...utronix.de>,
Johannes Weiner <hannes@...xchg.org>,
Andrew Morton <akpm@...ux-foundation.org>, kernel-team@...com,
Steven Rostedt <rostedt@...dmis.org>,
Alexey Dobriyan <adobriyan@...il.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jason Baron <jbaron@...mai.com>,
Kees Cook <keescook@...omium.org>, linux-api@...r.kernel.org
Subject: Re: [PATCH] printk: Userspace format enumeration support
On Sun, 2021-02-07 at 14:13 +0000, Chris Down wrote:
> Joe Perches writes:
> > > There are certainly printks which can't be trivially monitored using the printk
> > > format alone, but the vast majority of the ones that are monitored _do_ have
> > > meaningful formats and can be monitored over time. No solution to this is going
> > > to catch every single case, especially when so much of the information can be
> > > generated dyamically, but this patchset still goes a long way to making printk
> > > monitoring more tractable for use cases like the one described in the
> > > changelog.
> >
> > For the _vast_ majority of printk strings, this can easily be found
> > and compared using a trivial modification to strings.
>
> There are several issues with your proposed approach that make it unsuitable
> for use as part of a reliable production environment:
>
> 1. It misses printk() formats without KERN_SOH
>
> printk() formats without KERN_SOH are legal and use MESSAGE_LOGLEVEL_DEFAULT.
> On my test kernel, your proposed patch loses >5% of printk formats -- over 200
> messages -- due to this, including critical ones like those about hardware or
> other errors.
There are _very_ few of those printks without KERN_<level> and those
very few are not generally being changed.
> 2. Users don't always have the kernel image available
>
> Many of our machines and many of the machines of others like us do not boot
> using local storage, but instead use PXE or other technologies where the kernel
> may not be stored during runtime.
>
> As is described in the changelog, it is necessary to be able to vary
> remediations not only based on what is already in /dev/kmsg, but also to be
> able to make decisions about our methodology based on what's _supported_ in the
> running kernel at runtime, and your proposed approach makes this not viable.
Indirection would alway work.
You could load a separate file with output strings along with your
kernel image.
> 3. `KERN_SOH + level' can appear in other places than just printk strings
>
> KERN_SOH is just ASCII '\001' -- it's not distinctive or unique, even when
> paired with a check for something that looks like a level after it. For this
> reason, your proposed patch results in a non-trivial amount of non-printk
> related garbage in its output. For example:
>
> % binutils/strings -k /tmp/vmlinux | head -5
> 3L)s
> 3L)s
> c,[]A\
> c(L)c
> d$pL)d$`u4
>
> Fundamentally, one cannot use a tool which just determines whether something is
> printable to determine semantic intent.
$ kernel_strings --kernel --section ".rodata" vmlinux
I got exactly 0.
> 4. strings(1) output cannot differentiate embedded newlines and new formats
>
> The following has exactly the same output from strings(1), but will manifest
> completely differently at printk() time:
>
> printk(KERN_ERR "line one\nline two\nline three\n");
> printk("line four\n");
This is not the preferred output style and is only done in old and
unchanging code.
Your use case in your commit log is looking for _changed_ formats.
On Thu, 2021-02-04 at 15:37 +0000, Chris Down wrote:
> This patch provides a solution to the issue of silently changed or
> deleted printks:
Exactly _how_ many of these use cases do you think exist?
The generally preferred style for the example above would be:
pr_err("line one\n");
pr_err("line two\n");
pr_err("line three\n");
pr_err("line four\n");
> The originally posted patch _does_ differentiate between these cases, using \0
> as a reliable separator. Its outputs are, respectively:
>
> \0013line one\nline two\nline three\0\nline four\n\0
> \0013line one\nline two\n\0line three\nline four\n\0
>
> This isn't just a theoretical concern -- there are plenty of places which use
> multiline printks, and we must be able to distinguish between that and
> _multiple_ printks.
Just like there are many places that use buffered printks as the
example I gave earlier. None of which your proposed solution would find.
> 5. strings(1) is not contextually aware, and cannot be made to act as if it is
>
> strings has no idea about what it is reading, which is why it is more than
> happy to output the kind of meaningless output shown in #3. There are plenty of
> places across the kernel where there might be a sequence of bytes which the
> strings utility happens to interpret as being semantically meaningful, but in
> reality just happens to be an unrelated sequence of coincidentally printable
> bytes that just happens to contain a \001.
>
> I appreciate your willingness to propose other solutions, but for these
> reasons, the proposed strings(1) patch would not suffice as an interface for
> printk enumeration.
I think you are on a path to try to make printk output immutable.
I think that's a _very_ bad path.
I also think this is adding needless complexity.
A possible complexity I would like to support would be optionally
compressing printk format strings at compile time and uncompressing
them at use time.
Powered by blists - more mailing lists