[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170403112301.GB7867@jagdpanzerIV.localdomain>
Date: Mon, 3 Apr 2017 20:23:01 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Petr Mladek <pmladek@...e.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Steven Rostedt <rostedt@...dmis.org>, Jan Kara <jack@...e.cz>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"Rafael J . Wysocki" <rjw@...ysocki.net>,
Eric Biederman <ebiederm@...ssion.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jiri Slaby <jslaby@...e.com>, Pavel Machek <pavel@....cz>,
Len Brown <len.brown@...el.com>, linux-kernel@...r.kernel.org,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Subject: Re: [RFC][PATCHv2 1/8] printk: move printk_pending out of per-cpu
On (03/31/17 15:33), Peter Zijlstra wrote:
> On Fri, Mar 31, 2017 at 03:09:50PM +0200, Petr Mladek wrote:
> > On Wed 2017-03-29 18:25:04, Sergey Senozhatsky wrote:
>
> > > if (waitqueue_active(&log_wait)) {
> > > - this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> > > + set_bit(PRINTK_PENDING_WAKEUP, &printk_pending);
> >
> > We should add here a write barrier:
> >
> > /*
> > * irq_work_queue() uses cmpxchg() and implies the memory
> > * barrier only when the work is queued. An explicit barrier
> > * is needed here to make sure that wake_up_klogd_work_func()
> > * sees printk_pending set even when the work was already queued
> > * because of an other pending event.
> > */
> > smp_wmb();
> >
> > > irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > > }
> > > preempt_enable();
>
> smp_mb__after_atomic() is probably better, because if you're not
> ordering with the cmpxchg, you're ordering against a load done by
> cmpxchg to see it doesn't need to do anything.
Petr and Peter, thanks for the review.
can you educate me, what exactly is broken there?
when called from console_unlock(), we have something as follows
console_unlock()
{
for (;;) {
spin_lock_irqsave();
...
spin_unlock_irqrestore();
...
}
spin_unlock_irqrestore();
<<IRQs enabled>>
if (wake_klogd)
wake_up_klogd()
{
set_bit(PRINTK_PENDING_WAKEUP, &printk_pending);
irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
}
}
we queue a per-CPU irq_work. given that by the time we execute wake_up_klogd()
we have local IRQs enabled on that CPU. is it possible that we will have that
CPU's irq_work still being queued?
when called from printk_deferred().
I'm still trying to understand what scenario can cause the problem. so
basically on that CPU we have a call into the scheduler/timer which ends
up in printk_deferred()... and then we have console_unlock()->wake_up_klogd()
//* local IRQs enabled but the irq_work is still queued *// and atop of it
we have IRQ that executes that CPU's run_list and fails to see updated
PRINTK_PENDING_WAKEUP bit, because wake_up_klogd() was called on already
queued wake_up_klogd_work. is this the case? if so, can this race happen on
the CPU?
I don't object the barrier, I'm just trying to have a better understanding
what's broken. sorry if I'm missing something very obvious.
-ss
Powered by blists - more mailing lists