linux-kernel - Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170407051244.GB487@jagdpanzerIV.localdomain>
Date:   Fri, 7 Apr 2017 14:12:44 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Pavel Machek <pavel@....cz>
Cc:     Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Petr Mladek <pmladek@...e.com>, Jan Kara <jack@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        "Rafael J . Wysocki" <rjw@...ysocki.net>,
        Eric Biederman <ebiederm@...ssion.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Jiri Slaby <jslaby@...e.com>, Len Brown <len.brown@...el.com>,
        linux-kernel@...r.kernel.org,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Subject: Re: [RFC][PATCHv2 2/8] printk: introduce printing kernel thread

Hello,

On (04/06/17 19:14), Pavel Machek wrote:
[..]
> > @@ -1765,17 +1803,40 @@ asmlinkage int vprintk_emit(int facility, int level,
> >  
> >  	printed_len += log_output(facility, level, lflags, dict, dictlen, text, text_len);
> >  
> > +	/*
> > +	 * Emergency level indicates that the system is unstable and, thus,
> > +	 * we better stop relying on wake_up(printk_kthread) and try to do
> > +	 * a direct printing.
> > +	 */
> > +	if (level == LOGLEVEL_EMERG)
> > +		printk_kthread_disabled = true;
> > +
> > +	set_bit(PRINTK_PENDING_OUTPUT, &printk_pending);
> 
> Messages lower then _EMERG may be important, too.. and usually are,
> for debugging.
> 
> And you keep both code paths, anyway, so they have to work. So you did
> not really "fix" issues you are pointing out -- they still remain
> there for _EMERG and above.

we don't drop messages of lower levels. we just print then from a
schedulable context. once the things go off the rails, and EMERG
is a good hint, I think, we stop being optimismitcs and switch to
a "best effort" mode. that is sort of reasonable. if there is a
flood of EMERG messages that are not actually important and,
basically, are result of a coding error, then, I think, the we
must fix that coding error. I mean, there should be some common
sense, and doing
		while (1)
			printk(KERN_EMERG "hello\n");
is probably not.


> I agree that printing too much is a problem. Could you just print
> "(messages delayed)" in that case, then wake a kernel thread to [rint
> the rest?

sorry, but what difference would it make?

it's really unclear at what point we should offload printing if we begin
that "we will offload sometimes". for example, I've seen many spin-lock
lockups where printk was involved.

	CPU0					CPU1		CPU2					CPU3

	spin_lock(&lock)					spin_lock(&lock)			spin_lock(&lock)
	printk("foo") // grabs the console_sem
	printk("bar")				printk("a")
						printk("b")
						printk("c")
						...
						printk("z")
								spin_dump()				spin_dump()
	  call_console_drivers()				 printk()				 printk()
	   serial_driver_write()				 printk()				 printk()
	    spin_lock_irqsave(port->lock)			 ...					 ...
	     uart_console_write(...)				 trigger_all_cpu_backtrace()		 trigger_all_cpu_backtrace()
	      serial_driver_putchar()
	       while (!txrdy(...))
	         cpu_relax()


spin_dump() and trigger_all_cpu_backtrace() result in a bunch of
additional printk()-s so CPU0 has even more job to do in console_unlock(),
while it still holds the contended spin_lock. and so on; there are
many other examples.

so should we declare a "we can spend only 2 seconds in direct printk()
and then must offload printing" rule? I don't think it's much better
than a simpler "we always offload, as long as we think it's safe".

	-ss