linux-kernel - Re: Issue in dmesg time with lockless ring buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAGFReeOKwvsUb5hMMXLY3eTdqMDOae9mNjfp5BXx+BYfn5vMiw@mail.gmail.com>
Date:   Thu, 28 Jan 2021 17:46:03 -0800
From:   "J. Avila" <elavila@...gle.com>
To:     John Ogness <john.ogness@...utronix.de>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Andrea Parri <parri.andrea@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul McKenney <paulmck@...nel.org>,
        Saravana Kannan <saravanak@...gle.com>,
        kexec@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: Issue in dmesg time with lockless ring buffer

Hello John,

I’ve done some additional digging on my end. I tested using a 5.10.11
kernel and observed the following:

1) With the default of CONFIG_LOG_BUF_SHIFT=17, I was not able to reproduce
   the issue.
2) With CONFIG_LOG_BUF_SHIFT=20, I was able to reproduce the behavior
   mentioned before.
3) With (2) + reverting up to and including 896fbe20b4e2 (printk: use the
   lockless ringbuffer), I saw short dmesg times again.

It seems that this issue may only exist with a sufficiently big log buffer
size. Despite 1MB being a relatively uncommon size for linux kernel log
buffers, this still indicates a potential issue in the code; do you think
it's worth investigation?

Thanks,

Avila

On Mon, Jan 25, 2021 at 4:00 PM J. Avila <elavila@...gle.com> wrote:
>
> Hello,
>
> This dmesg uses /dev/kmsg; we've verified that we don't see this long
> dmesg time when reading from syslog (via dmesg -S).
>
> We've also tried testing this with logging daemons disabled as well as
> within initrd - both result in similar behavior.
>
> If it's relevant, this was done on a toybox shell.
>
> Thanks,
>
> Avila
>
> On Mon, Jan 25, 2021 at 5:32 AM John Ogness <john.ogness@...utronix.de> wrote:
> >
> > On 2021-01-22, "J. Avila" <elavila@...gle.com> wrote:
> > > When doing some internal testing on a 5.10.4 kernel, we found that the
> > > time taken for dmesg seemed to increase from the order of milliseconds
> > > to the order of seconds when the dmesg size approached the ~1.2MB
> > > limit. After doing some digging, we found that by reverting all of the
> > > patches in printk/ up to and including
> > > 896fbe20b4e2333fb55cc9b9b783ebcc49eee7c7 ("use the lockless
> > > ringbuffer"), we were able to once more see normal dmesg times.
> > >
> > > This kernel had no meaningful diffs in the printk/ dir when compared
> > > to Linus' tree. This behavior was consistently reproducible using the
> > > following steps:
> > >
> > > 1) In one shell, run "time dmesg > /dev/null"
> > > 2) In another, constantly write to /dev/kmsg
> > >
> > > Within ~5 minutes, we saw that dmesg times increased to 1 second, only
> > > increasing further from there. Is this a known issue?
> >
> > The last couple days I have tried to reproduce this issue with no
> > success.
> >
> > Is your dmesg using /dev/kmsg or syslog() to read the buffer?
> >
> > Are there any syslog daemons or systemd running? Perhaps you can run
> > your test within an initrd to see if this effect is still visible?
> >
> > John Ogness