linux-kernel - Re: blk->id read race: was: [PATCH v2 2/3] printk: add lockless buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87k10fowjv.fsf@vostro.fn.ogness.net>
Date:   Wed, 10 Jun 2020 15:55:00 +0200
From:   John Ogness <john.ogness@...utronix.de>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Andrea Parri <parri.andrea@...il.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        kexec@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Paul McKenney <paulmck@...nel.org>
Subject: Re: blk->id read race: was: [PATCH v2 2/3] printk: add lockless buffer

On 2020-06-10, Petr Mladek <pmladek@...e.com> wrote:
>>>> --- /dev/null
>>>> +++ b/kernel/printk/printk_ringbuffer.c
>>>> +/*
>>>> + * Given a data ring (text or dict), put the associated descriptor of each
>>>> + * data block from @lpos_begin until @lpos_end into the reusable state.
>>>> + *
>>>> + * If there is any problem making the associated descriptor reusable, either
>>>> + * the descriptor has not yet been committed or another writer task has
>>>> + * already pushed the tail lpos past the problematic data block. Regardless,
>>>> + * on error the caller can re-load the tail lpos to determine the situation.
>>>> + */
>>>> +static bool data_make_reusable(struct printk_ringbuffer *rb,
>>>> +			       struct prb_data_ring *data_ring,
>>>> +			       unsigned long lpos_begin,
>>>> +			       unsigned long lpos_end,
>>>> +			       unsigned long *lpos_out)
>>>> +{
>>>> +	struct prb_desc_ring *desc_ring = &rb->desc_ring;
>>>> +	struct prb_data_blk_lpos *blk_lpos;
>>>> +	struct prb_data_block *blk;
>>>> +	unsigned long tail_lpos;
>>>> +	enum desc_state d_state;
>>>> +	struct prb_desc desc;
>>>> +	unsigned long id;
>>>> +
>>>> +	/*
>>>> +	 * Using the provided @data_ring, point @blk_lpos to the correct
>>>> +	 * blk_lpos within the local copy of the descriptor.
>>>> +	 */
>>>> +	if (data_ring == &rb->text_data_ring)
>>>> +		blk_lpos = &desc.text_blk_lpos;
>>>> +	else
>>>> +		blk_lpos = &desc.dict_blk_lpos;
>>>> +
>>>> +	/* Loop until @lpos_begin has advanced to or beyond @lpos_end. */
>>>> +	while ((lpos_end - lpos_begin) - 1 < DATA_SIZE(data_ring)) {
>>>> +		blk = to_block(data_ring, lpos_begin);
>>>> +		id = READ_ONCE(blk->id); /* LMM(data_make_reusable:A) */
>>>
>>> This would deserve some comment:
>
> I wonder if the comment might look like:
>
> /*
>  * No barrier is needed between reading tail_lpos and the related
>  * blk->id. Only CPU that modifies tail_lpos via cmpxchg is allowed
>  * to modify the related blk->id. CPUs that see the moved tail_lpos
>  * are looking at another block related to the new tail_lpos.
>  * It does not mater when the previous winner modifies the previous
>  * block.
>  */

Sorry, but this comment does not make sense for me. The tail is pushed
_before_ the block ID is modified. So any kind of barrier here (where we
read the tail, then the block ID, i.e. the same order) would be
inappropriate anyway.

Also, this comment only talks about when a new value is seen, but not
about when the old value is seen. IMO it is seeing the old value that is
worthy of a comment since that is the only case with a data race.

In preparation for my next version I have added the following comment:

        blk = to_block(data_ring, lpos_begin);

        /*
         * When going from lpos to block pointer, the wrap around
         * feature of the lpos value is lost. Since another CPU could
         * invalidate this data area at any time, the data tail must
         * be re-checked after the block ID has been read.
         */

        id = blk->id; /* LMM(data_make_reusable:A) */

I think this comment also helps to further clarify "why" the following
data tail check occurs.

John Ogness