[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87v9urdq05.fsf@linutronix.de>
Date: Wed, 21 Aug 2019 07:52:26 +0200
From: John Ogness <john.ogness@...utronix.de>
To: Petr Mladek <pmladek@...e.com>
Cc: linux-kernel@...r.kernel.org,
Andrea Parri <andrea.parri@...rulasolutions.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Brendan Higgins <brendanhiggins@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: assign_desc() barriers: Re: [RFC PATCH v4 1/9] printk-rb: add a new printk ringbuffer implementation
On 2019-08-20, Petr Mladek <pmladek@...e.com> wrote:
>> > --- /dev/null
>> > +++ b/kernel/printk/ringbuffer.c
>> > +/**
>> > + * assign_desc() - Assign a descriptor to the caller.
>> > + *
>> > + * @e: The entry structure to store the assigned descriptor to.
>> > + *
>> > + * Find an available descriptor to assign to the caller. First it is checked
>> > + * if the tail descriptor from the committed list can be recycled. If not,
>> > + * perhaps a never-used descriptor is available. Otherwise, data blocks will
>> > + * be invalidated until the tail descriptor from the committed list can be
>> > + * recycled.
>> > + *
>> > + * Assigned descriptors are invalid until data has been reserved for them.
>> > + *
>> > + * Return: true if a descriptor was assigned, otherwise false.
>> > + *
>> > + * This will only fail if it was not possible to invalidate data blocks in
>> > + * order to recycle a descriptor. This can happen if a writer has reserved but
>> > + * not yet committed data and that reserved data is currently the oldest data.
>> > + */
>> > +static bool assign_desc(struct prb_reserved_entry *e)
>> > +{
>> > + struct printk_ringbuffer *rb = e->rb;
>> > + struct prb_desc *d;
>> > + struct nl_node *n;
>> > + unsigned long i;
>> > +
>> > + for (;;) {
>> > + /*
>> > + * jA:
>> > + *
>> > + * Try to recycle a descriptor on the committed list.
>> > + */
>> > + n = numlist_pop(&rb->nl);
>> > + if (n) {
>> > + d = container_of(n, struct prb_desc, list);
>> > + break;
>> > + }
>> > +
>> > + /* Fallback to static never-used descriptors. */
>> > + if (atomic_read(&rb->desc_next_unused) < DESCS_COUNT(rb)) {
>> > + i = atomic_fetch_inc(&rb->desc_next_unused);
>> > + if (i < DESCS_COUNT(rb)) {
>> > + d = &rb->descs[i];
>> > + atomic_long_set(&d->id, i);
>> > + break;
>> > + }
>> > + }
>> > +
>> > + /*
>> > + * No descriptor available. Make one available for recycling
>> > + * by invalidating data (which some descriptor will be
>> > + * referencing).
>> > + */
>> > + if (!dataring_pop(&rb->dr))
>> > + return false;
>> > + }
>> > +
>> > + /*
>> > + * jB:
>> > + *
>> > + * Modify the descriptor ID so that users of the descriptor see that
>> > + * it has been recycled. A _release() is used so that prb_getdesc()
>> > + * callers can see all data ringbuffer updates after issuing a
>> > + * pairing smb_rmb(). See iA for details.
>> > + *
>> > + * Memory barrier involvement:
>> > + *
>> > + * If dB->iA reads from jB, then dI reads the same value as
>> > + * jA->cD->hA.
>> > + *
>> > + * Relies on:
>> > + *
>> > + * RELEASE from jA->cD->hA to jB
>> > + * matching
>> > + * RMB between dB->iA and dI
>> > + */
>> > + atomic_long_set_release(&d->id, atomic_long_read(&d->id) +
>> > + DESCS_COUNT(rb));
>>
>> atomic_long_set_release() might be a bit confusing here.
>> There is no related acquire.
As the comment states, this release is for prb_getdesc() users. The only
prb_getdesc() user is _dataring_pop(). If getdesc() returns NULL
(i.e. the descriptor's ID is not what _dataring_pop() was expecting),
then the tail must have moved and _dataring_pop() needs to see
that. Since there are no data dependencies between descriptor ID and
tail_pos, an explicit memory barrier is used. More on this below.
>> In fact, d->id manipulation has barriers from both sides:
>>
>> + smp_rmb() before so that all reads are finished before
>> the id is updated (release)
>
> Uh, this statement does not make sense. The read barrier is not
> needed here. Instead the readers need it.
>
> Well, we might need a write barrier before d->id manipulation.
> It should be in numlist_pop() after successfully updating nl->tail_id.
> It will allow readers to detect that the desriptor is being reused
> (not in valid tail_id..head_id range) before we start manipulating it.
>
>> + smp_wmb() after so that the new ID is written before other
>> related values are modified (acquire).
>>
>> The smp_wmb() barrier is in prb_reserve(). I would move it here.
>
> This still makes sense. I would move the write barrier from
> prb_reserve() here.
The issue is that _dataring_pop() needs to see a moved dataring tail if
prb_getdesc() fails. Just because numlist_pop() succeeded, doesn't mean
that this was the task that changed the dataring tail. I.e. another CPU
could observe that this task changed the ID but _not_ yet see that
another task changed the dataring tail.
Issuing an smp_mb() before setting the the new ID would also suffice,
but that is a pretty big hammer for something that a set_release can
take care of.
> Sigh, I have to admit that I am not familiar with the _acquire(),
> _release(), and _relaxed() variants of the atomic operations.
>
> They probably make it easier to implement some locking API.
> I am not sure how to use it here. This code implements a complex
> interlock between several variables. I mean that several variables
> lock each other in a cycle, like a state machine? In each case,
> it is not a simple locking where we check state of a single
> variable.
Keep in mind that dataring and numlist were written independent of the
ringbuffer. They are structures with very specific purposes and their
own set of variables (and memory barriers to order _those_
variables). The high-level ringbuffer also has its own variables and
memory barriers. Sometimes there is overlap, which is implemented in the
callbacks (as is here), which is why the dataring callback getdesc() has
the implementation requirement that a following smp_rmb() by the caller
will guarantee seeing an updated dataring tail. But these overlaps are
the exception, not the rule.
I think trying to see "everything at once" with a top-down view is going
to seem too complex and hurt your brain. I think it would be easier to
verify the internal consistency of the individual dataring and numlist
structures first. Once you have faith in the integrity of those
structures, moving to the high-level ringbuffer is a much smaller step.
John Ogness
Powered by blists - more mailing lists