linux-kernel - Re: [PATCH POC] printk_ringbuffer: Alternative implementation of lockless printk ringbuffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87r275j15h.fsf@linutronix.de>
Date:   Thu, 04 Jul 2019 16:59:54 +0200
From:   John Ogness <john.ogness@...utronix.de>
To:     Petr Mladek <pmladek@...e.com>
Cc:     linux-kernel@...r.kernel.org,
        Peter Zijlstra <peterz@...radead.org>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Andrea Parri <andrea.parri@...rulasolutions.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Subject: Re: [PATCH POC] printk_ringbuffer: Alternative implementation of lockless printk ringbuffer

Hi Petr,

On 2019-07-04, Petr Mladek <pmladek@...e.com> wrote:
> This is POC that implements the lockless printk ringbuffer slightly
> different way. I believe that it is worth considering because it looks
> much easier to deal with. The reasons are:
>
> 	+ The state of each entry is always clear.
>
> 	+ The write access rights and validity of the data
> 	  are clear from the state of the entry.
>
> 	+ It seems that three barriers are enough to synchronize
> 	  readers vs. writers. The rest is done implicitely
> 	  using atomic operations.
>
> Of course, I might have missed some critical race that can't get
> solved by the new design easily. But I do not see it at the moment.

Two things jump out at me when looking at the implementation:

1. The code claims that the cmpxchg(seq_newest) in prb_reserve_desc()
guarantees that "The descriptor is ours until the COMMITTED bit is set."
This is not true if in that wind seq_newest wraps, allowing another
writer to gain ownership of the same descriptor. For small descriptor
arrays (such as in my test module), this situation is quite easy to
reproduce.

This was one of the reasons I chose to use a linked list. When the
descriptor is atomically removed from the linked list, it can _never_ be
used (or even seen) by any other writer until the owning writer is done
with it.

I'm not yet sure how that could be fixed with this implementation. The
state information is in a separate variable than the head pointer for
the descriptor array (seq_newest). This means you cannot atomically
reserve descriptors.

2. Another issue is when prb_reserve() fails and sets the descriptor as
unused. As it is now, no reader can read beyond that descriptor until it
is recycled. Readers need to know that the descriptor is bad and can be
skipped over. It might be better to handle this the way I did: go ahead
and set the state to committed, but have invalid lpos/lpos_next values
(for example: lpos_next=lpos) so the reader knows it can skip over the
descriptor.

> The code compiles but it is not really tested. I wanted to send it
> ASAP in a good enough state before we spend more time on cleaning
> up either of the proposals.

I am glad to see you put together your implementation. If anything, it
shows you understand the design! If after seeing my next version (v3)
you are still convinced that using a linked list for the descriptors is
too complex, then I can help support your idea to move to an array.

Thank you for taking so much time for this!

John Ogness