linux-kernel - Re: [RFC][PATCHv2 1/8] printk: move printk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170403112301.GB7867@jagdpanzerIV.localdomain>
Date:   Mon, 3 Apr 2017 20:23:01 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Petr Mladek <pmladek@...e.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>, Jan Kara <jack@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        "Rafael J . Wysocki" <rjw@...ysocki.net>,
        Eric Biederman <ebiederm@...ssion.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jiri Slaby <jslaby@...e.com>, Pavel Machek <pavel@....cz>,
        Len Brown <len.brown@...el.com>, linux-kernel@...r.kernel.org,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Subject: Re: [RFC][PATCHv2 1/8] printk: move printk_pending out of per-cpu

On (03/31/17 15:33), Peter Zijlstra wrote:
> On Fri, Mar 31, 2017 at 03:09:50PM +0200, Petr Mladek wrote:
> > On Wed 2017-03-29 18:25:04, Sergey Senozhatsky wrote:
> 
> > >  	if (waitqueue_active(&log_wait)) {
> > > -		this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
> > > +		set_bit(PRINTK_PENDING_WAKEUP, &printk_pending);
> > 
> > We should add here a write barrier:
> > 
> > 	/*
> > 	 * irq_work_queue() uses cmpxchg() and implies the memory
> > 	 * barrier only when the work is queued. An explicit barrier
> > 	 * is needed here to make sure that wake_up_klogd_work_func()
> > 	 * sees printk_pending set even when the work was already queued
> > 	 * because of an other pending event.
> > 	 */
> > 	 smp_wmb();
> > 
> > >  		irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
> > >  	}
> > >  	preempt_enable();
> 
> smp_mb__after_atomic() is probably better, because if you're not
> ordering with the cmpxchg, you're ordering against a load done by
> cmpxchg to see it doesn't need to do anything.

Petr and Peter, thanks for the review.

can you educate me, what exactly is broken there?

when called from console_unlock(), we have something as follows

	console_unlock()
	{
		for (;;) {
			spin_lock_irqsave();
			...
			spin_unlock_irqrestore();
			...
		}

		spin_unlock_irqrestore();

<<IRQs enabled>>

		if (wake_klogd)
			wake_up_klogd()
			{
				set_bit(PRINTK_PENDING_WAKEUP, &printk_pending);
				irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
			}
	}


we queue a per-CPU irq_work. given that by the time we execute wake_up_klogd()
we have local IRQs enabled on that CPU. is it possible that we will have that
CPU's irq_work still being queued?


when called from printk_deferred().

I'm still trying to understand what scenario can cause the problem. so
basically on that CPU we have a call into the scheduler/timer which ends
up in printk_deferred()... and then we have console_unlock()->wake_up_klogd()
//* local IRQs enabled but the irq_work is still queued *// and atop of it
we have IRQ that executes that CPU's run_list and fails to see updated
PRINTK_PENDING_WAKEUP bit, because wake_up_klogd() was called on already
queued wake_up_klogd_work. is this the case? if so, can this race happen on
the CPU?

I don't object the barrier, I'm just trying to have a better understanding
what's broken. sorry if I'm missing something very obvious.

	-ss