linux-kernel - Re: [PATCH 2/2] printk: use the lockless ringbuffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200214014113.GE36551@google.com>
Date:   Fri, 14 Feb 2020 10:41:13 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     John Ogness <john.ogness@...utronix.de>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Petr Mladek <pmladek@...e.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Andrea Parri <parri.andrea@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        kexec@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] printk: use the lockless ringbuffer

On (20/02/13 23:36), John Ogness wrote:
> >> Here prb_read_valid() was successful, so a record _was_ read. The
> >> kerneldoc for the prb_read_valid() says:
> >
> > Hmm, yeah. That's true.
> >
> > OK, something weird...
> >
> > I ran some random printk-pressure test (mostly printks from IRQs;
> > + some NMI printk-s, but they are routed through nmi printk-safe
> > buffers; + some limited number of printk-safe printk-s, routed
> > via printk-safe buffer (so, once again, IRQ); + user-space
> > journalctl -f syslog reader), and after the test 'cat /dev/kmsg'
> > is terminally broken
> >
> > [..]
> > cat /dev/kmsg
> > cat: /dev/kmsg: Broken pipe
>
> In mainline you can have this "problem" as well. Once the ringbuffer has
> wrapped, any read to a newly opened /dev/kmsg when a new message arrived
> will result in an EPIPE. This happens quite easily once the ringbuffer
> has wrapped because each new message is overwriting the oldest message.

Hmm. Something doesn't add up.

Looking at the numbers, both r->info->seq and prb_first_seq(prb)
do increase, so there are new messages in the ring buffer

                           u->seq    r->seq    prb_first_seq
[..]
cat: devkmsg_read() error 1981080   1982633   1981080
cat: devkmsg_read() error 1981080   1982633   1981080
cat: devkmsg_read() error 1981095   1982652   1981095
cat: devkmsg_read() error 1981095   1982652   1981095
cat: devkmsg_read() error 1981095   1982652   1981095
[..]

but 'cat' still wouldn't read anything from the logbuf - EPIPE.

NOTE: I don't run 'cat /dev/kmsg' during the test. I run the test first,
then I run 'cat /dev/kmsg', after the test, when printk-pressure is gone.

I can't reproduce it with current logbuf. 'cat' reads from /dev/kmsg after
heavy printk-pressure test. So chances are some loggers can also experience
problems. This might be a regression.

> > ...
> > systemd-journal: devkmsg_read() error 1979281 1982465 1980933
> > systemd-journal: corrected seq 1982465 1982465
> > cat: devkmsg_read() error 1980987 1982531 1980987
> > cat: corrected seq 1982531 1982531
> > cat: devkmsg_read() error 1981015 1982563 1981015
> > cat: corrected seq 1982563 1982563
>
> The situation with a data-less record is the same as when the ringbuffer
> wraps: cat is hitting that EPIPE. But re-opening the file descriptor is
> not going to help because it will not be able to get past that data-less
> record.

So maybe this is the case with broken 'cat' on my system?

> We could implement it such that devkmsg_read() will skip over data-less
> records instead of issuing an EPIPE. (That is what dmesg does.) But then
> do we need EPIPE at all? The reader can see that is has missed records
> by tracking the sequence number, so could we just get rid of EPIPE? Then
> cat(1) would be a great tool to view the raw ringbuffer. Please share
> your thoughts on this.

Looking at systemd/src/journal/journald-kmsg.c : server_read_dev_kmsg()
-EPIPE is just one of the erronos they handle, nothing special. Could it
be the case that some other loggers would have special handling for EPIPE?
I'm not sure, let's look around.

I'd say that EPIPE removal looks OK to me. But before we do that, I'm
not sure that we have clear understanding of 'cat /dev/kmsg' behaviour
change.

> On a side note (but related to data-less records): I hacked the
> ringbuffer code to inject data-less records at various times in order to
> verify your report. And I stumbled upon a bug in the ringbuffer, which
> can lead to an infinite loop in console_unlock(). The problem occurs at:
> 
>     retry = prb_read_valid(prb, console_seq, NULL);
> 
> which will erroneously return true if console_seq is pointing to a
> data-less record but there are no valid records after it. The following
> patch fixes the bug. And yes, for v2 I have added comments to the
> desc_read_committed() code.

That's great to know!

	-ss