linux-kernel - Re: [PATCH v12 0/3] printk: Make printk() completely async

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160712162809.GE32373@pathway.suse.cz>
Date:	Tue, 12 Jul 2016 18:28:09 +0200
From:	Petr Mladek <pmladek@...e.com>
To:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Jan Kara <jack@...e.com>, Tejun Heo <tj@...nel.org>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	linux-kernel@...r.kernel.org,
	Byungchul Park <byungchul.park@....com>,
	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Subject: Re: [PATCH v12 0/3] printk: Make printk() completely async

On Wed 2016-06-29 14:08:35, Sergey Senozhatsky wrote:
> Hello,
> 
> the patch to fix the async KERN_CONT printk regression I mentioned
> several days ago in another thread.
> 
> as a separate patch for now, to ease the review. will squash with 0001.
> 
> ============== 8< ==============
> 
> >From 70f65ed55eb82a1b8b74fdbd1a7afc9e77e9b380 Mon Sep 17 00:00:00 2001
> From: Sergey Senozhatsky <sergey.senozhatsky@...il.com>
> Subject: [PATCH] printk: print KERN_CONT buffer in sync printk mode
> 
> One of my boxen has revealed that there is a race condition now between
> pr_cot() output and printk_kthread wake_up() time. Schematically, the CPU
> that triggered the race conditions was doing
> something like this:
> 
> 	for (...) {
> 		for (....) {
> 			...
> 			pr_cont(...);
> 			...
> 		}
> 		pr_cont("\n");
> 	}

I have tried the following:

       int a,b;

       for (a = 0; a < 1000; a++) {
               for (b = 0; b < 10; b++) {
                       pr_cont("item%04d", a*10 + b);
               }
               pr_cont("\n");
       }


> the resulting output was:
> 
> $ dmesg
> 
> 00000001 00000002 00000003 00000004 00000005 00000006 00000007
> 00000008
> 00000009 0000000a 0000000b 0000000c 0000000d 0000000e
> 0000000f 00000010 00000011 00000012 00000013 00000014 00000015
> 0000002b 0000002c 0000002d 0000002e 0000002f 00000030 00000031
> ....

This patch helped a lot but I still see the problem from time to time.
For example,

[ 4913.000290] item8720item8721item8722item8723item8724item8725item8726item8727item8728item8729
[ 4913.000294] item8730item8731item8732item8733item8734item8735item8736item8737item8738item8739
[ 4913.000298] item8740item8741item8742item8743item8744item8745item8746item8747item8748item8749
[ 4913.000310] item8750item8751item8752item8753item8754item8755item8756item8757item8758item8759
[ 4913.000327] item8760item8761item8762item8763item8764item8765item8766item8767item8768item8769
[ 4913.000342] item8770item8771item8772item8773item8774item8775item8776item8777item8778item8779
[ 4913.000356] item8780
[ 4913.000357] item8781
[ 4913.000358] item8782
[ 4913.000358] item8783
[ 4913.000359] item8784
[ 4913.000359] item8785
[ 4913.000359] item8786
[ 4913.000360] item8787
[ 4913.000360] item8788
[ 4913.000360] item8789

[ 4913.000361] item8790
[ 4913.000361] item8791
[ 4913.000361] item8792
[...]


> 
> KERN_CONT output is heavily relying on the fact there will be no other
> CPU(-s)/process(-es) doing any sort (KERN_CONT etc.) of printk-s simultaneously.
> This basically means that pr_cont() CPU is expected to be the one that actually
> does the printing and log_store() of the entire cont buffer:

In particular, the partially flushed cont buffer could not be used for new
messages until the rest of it is flushed to the console. It is because
the number of flushed characters is stored in the struct cont.


> 	pr_cont()
> 		console_unlock()
> 			console_cont_flush()
> 			log_store()	/* cont buffer */
> 	pr_cont()
> 		console_unlock()
> 			console_cont_flush()
> 			log_store()	/* cont buffer */
> 	....
> 
> Async printk breaks this contract, because console_cont_flush() is now deferred.
> Thus pr_cont() will see a not fully flushed cont buffer, which will force it to
> append a new KERN_CONT not to cont buffer, but to buffer log as a separate log
> entry via log_store(). Due to the deferred nature of console_cont_flush() the
> very next pr_cont() may see cont buffer being flushed, so it will keep the
> message in the cont buffer and log_store() it later. So the cont line will split
> across several log entries -- printk_kthread can take some time to wkeup,
> during which pr_cont() will keep splitting cont line:

I guess that the problem was there even before async printk. For
example, we do not keep lockbuf_lock between console_cont_flush()
and printing other lines from the ring buffer. It means that
new lines might appear in the buffer in the meantime.

Note that we call consoles in console_cont_flush() without
lockbuf_lock. So there is relatively long time for the race.

In each case, the async printk made it worse because there
might be longer periods between the printk thread is scheduled
and the rest of the cont buffer is flushed.



Anyway, this solution makes the async printk less helpful.
Any message that does not end with a new line will force
the sync mode and the risk of the soft lookup.


Another solution would be to remember cont.cons outside
of struct cont. Then the cont buffer could be reused
immediately.

The saved position will be used either for the actual
cont buffer or for the very first message in the ring
buffer. The situation might be detected by comparing
console_seq, log_next_seq, and log_first_seq.

if (console_seq == log_next_seq)
   cont buffer still includes the very last message

if (console_seq < log_first_seq)
   the message was lost => no partialy flushed message around

if (console_seq < log_next_seq && cont.cons)
   the current message in the log buffer is partially printed


Well, this looks quite complicated. And I am afraid that
I have missed something.

I am still scratching my head around it.

Best Regards,
Petr