linux-kernel - Re: printk meeting at LPC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190918051933.GA220683@jagdpanzerIV>
Date:   Wed, 18 Sep 2019 14:19:33 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     John Ogness <john.ogness@...utronix.de>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Petr Mladek <pmladek@...e.com>,
        Andrea Parri <parri.andrea@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Brendan Higgins <brendanhiggins@...gle.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Theodore Ts'o <tytso@....edu>, Paul Turner <pjt@...gle.com>,
        Daniel Vetter <daniel.vetter@...ll.ch>,
        Prarit Bhargava <prarit@...hat.com>
Subject: Re: printk meeting at LPC

On (09/18/19 11:36), Sergey Senozhatsky wrote:
[..]
> For instance, tty/sysrq must be able to switch printk emergency on/off.
> That already means that printk emergency knob should be visible to the
> rest of the kernel. A long time ago, we had printk_emergency_begin_sync()
> and printk_emergency_end_sync(), which would define reentrable
> printk_emergency blocks [1]:
>
> 	printk_emergency_begin_sync();
> 	handle_sysrq();
> 	printk_emergency_end_sync();

Some explanations.

How did we come up to that _sync() printk() emergency mode (when we
make sure that there is no active printing kthread)? We had a number
of cases (complaints) of lost kernel messages. There are scenarios
in which we cannot offload to async preemptible printing kthread,
because current control path is, for instance, going to reboot the
kernel. In sync printk() mode we have some sort (!) of guarantees
that when we do

		pr_emerg("Restarting system\n");
		kmsg_dump(KMSG_DUMP_RESTART);
		machine_restart(cmd);

pr_emerg("Restarting system\n") is going to flush logbuf before the
system will machine_restart().

I can also recall a regression report from 0day bot. 0day uses sysrq over
serial to reboot running qemu instances. The way things currently work
is that we have printk() in sysrq handler, which flushes logbuf before
it reboots the system. With printk_kthread offloading this "flush logbuf
before reboot()" was not there, because printing was offloaded to kthread,
so the system used to immediately reboot with pending (and thus lost)
logbuf messages.

I suspect that emergency flush from sysrq is easier to handle once we
have one global printing kthread.

Suppose:

	logbuf

	id 100
	id 101
	id 102
	...
	id 198   <- printing kthread
	id 199
	id 200
<+> sysrq, a bunch of printk()-s in emergency flush mode
	id 201 -> atomic_write()
	id 202 -> atomic_write()
	...
	id 300 -> atomic_write()
<-> sysrq iret

When we park printing kthread, we make sure that the first sysrq->printk()
will also print pending messages 198,199,200 before it prints message 201.
When we unpark printing kthread it knows that there are no pending messages
(last printed message id is in the logbuf head).

It's going to be a bit harder when we have per-console kthread. If
per-console kthread is simply gogin to continue from the last message id
it printed (e.g. 198) then it will re-print messages which we already
printed via ->atomic_write() path. If all per-console printing kthread
are going to jump to id 300, because this is the last printed id on
consoles, then we can lose some messages on consoles (possibly a different
number of messages on different consoles, depending on console's kthread
position).

Once again, I'm sorry I was not on LPC/KS and maybe you have already
discussed all of those cases and got everything covered.

	-ss