lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20080415162606.50fcfaaa.akpm@linux-foundation.org>
Date:	Tue, 15 Apr 2008 16:26:06 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	David Howells <dhowells@...hat.com>
Cc:	torvalds@...ux-foundation.org, linux-kernel@...r.kernel.org,
	dhowells@...hat.com
Subject: Re: [PATCH] PRINTK: Decouple input locking from output locking and
 move render outside lock

On Tue, 08 Apr 2008 19:01:29 +0100
David Howells <dhowells@...hat.com> wrote:

> Decouple the printk input locking from the output locking using a dual-tailed
> ring buffer, and move the render (vscnprintf) and processing (log level token
> inclusion and timestamp inclusion) outside of the input locked section.
> 
> 	+---------+
> 	|         |
> 	| vprintk |
> 	|         |
> 	+---------+                                   +-----------------+
> 	     |                                        |                 |
> 	     V                                   +--->| Syslog syscall  |
> 	+---------+                              |    |                 |
> 	|         |                              |    +-----------------+
> 	| Render  |                              |
> 	|         |                              |    +-----------------+
> 	+---------+      +-------------------+   |    |                 |
> 	     |           | Ring buffer       |   +--->| /proc/kmsg      |
> 	     V           |   +---------------+   |    |                 |
> 	+---------+      |   | syslog queue  |---+    +-----------------+
> 	|         |      |   +---------------+
> 	| Process |----->|                   |
> 	|         |      |   +---------------+
> 	+---------+      |   | console queue |---+    +-----------------+
> 	                 |   +---------------+   |    |                 |
> 	                 |                   |   +--->| Console drivers |
> 	                 +-------------------+        |                 |
> 	                                              +-----------------+
> 
> 
> New/renamed global variables:
> 
>  (*) printk_insertion_point is the point in the ring buffer at which data is
>      inserted.  It may only be moved if printk_insertion_lock is held.
> 
>  (*) log_extraction_point is the point in the ring buffer from which data is
>      read out by do_syslog().  It may only be moved if log_extraction_lock is
>      held.
> 
>  (*) con_extraction_point is the point in the ring buffer from which data is
>      read out to be fed to the console drivers.  It may only be moved if
>      console_sem is held.  No spinlock needs to be held whilst writing to the
>      console.
> 
>  (*) console_anytime is the number of registered drivers that have CON_ANYTIME
>      flagged.  This avoids the need to walk the console list regularly before
>      the CPU is officially online.
> 
> 
> This patch has the following effects:
> 
>  (1) vprintk() spends much less time within the insertion critical section.
> 
>  (2) vprintk() can be filling the buffer whilst do_syslog() and
>      release_console_sem are emptying it.  Memory barriers are employed to
>      prevent out-of-order access between the buffers and the pointers.
> 
>  (3) Instead of a single render buffer, vprintk() has two render buffers per
>      CPU.  This allows multiple CPUs to be rendering and processing their
>      rendered texts at the same time without the need for this to be within a
>      spinlock.
> 
>  (4) vprintk() first prints the string into the first buffer for that CPU, and
>      then post-processes it into the CPU's second buffer to make sure metadata
>      is present (log level and timestamp)
> 
>      As the rendering is done into a temporary buffer rather than a ring
>      buffer, the printed output is more likely to get truncated as there's less
>      room for metadata.  This can be alleviated by increasing the size of the
>      processing output buffer.
> 
>      These render buffers amount to 2KB * NR_CPUS in total instead of 1KB.
> 
>  (5) The timestamp inserted by vprintk() into a line is the time at which the
>      rendered line is processed, not the time it hits the buffer.  This means:
> 
>      (a) all those calls to do_div() and snprintf() are done outside the
>      	 critical section (except for recursion messages), and
> 
>      (b) lines from different CPUs may appear to be out of order (this can be
>      	 solved by throwing a lock around the processing section, a lock that's
>      	 not printk_insertion_lock).
> 
>  (6) A partial vprintk() on one CPU (ie: not ending in "\n") will have a
>      newline inserted after it if another CPU gets a line in first.
> 
>  (7) release_console_sem() does not need to take any spinlocks; console_sem is
>      probably sufficient locking for talking to console drivers.  Although the
>      function disables IRQs too, this may be unnecessary, provided the IRQ
>      handler doesn't try filling up the ring buffer without being able to empty
>      it because someone further up the stack holds console_sem.
> 
>  (8) vprintk() may have to discard messages under high load:
> 
>      (a) If the console queue can't be emptied fast enough, new messages are
>      	 discarded wholly or in part.  Possibly old messages should be
>      	 discarded instead.
> 
>      (b) If the syslog queue can't be emptied fast enough, then if vprintk()
>      	 can immediately get the appropriate lock, it will advance the
>      	 extraction point by at least 1024 characters, rounded up to the next
>      	 newline, thus discarding some old messages.  Failing that, new
>      	 messages will be discarded wholly or in part instead.
> 
> 
> Why this patch?
> ===============
> 
> We're seeing the kernel spit messages out about CPUs getting stuck on the
> console locks.  This appears to be due to I/O errors in a just-disconnected
> iSCSI block device causing lots and lots of printks.  The system on which this
> is being seen is a ppc64 system with spinlocks that yield to the hypervisor,
> which I suspect is exacerbating the situation.
> 
> This patch attempts to alleviate the situation by shortening the amount of time
> spent in the kernel logging critical sections by breaking up the locking and
> moving out stuff that can be done outside of the critical section.
> 
> 
> Testing
> =======
> 
> I've used my FRV test board (32-bit/BE/UP) firstly to step through the code and
> check the various values with GDB under various cases, and secondly to generate
> a sequence of numbered printks, which I've checked have all been correctly
> displayed, and correctly ordered.
> 
> I've also run this up on my x86_64 box with 2 CPUs (64-bit/LE/SMP), checking
> that it correctly handles output from FS-Cache and CacheFiles with full
> debugging output turned on, both in printing to tty and serial consoles and in
> extraction via syslog().
> 
> ...
> 
>  kernel/printk.c |  820 +++++++++++++++++++++++++++++++++++--------------------

geeze, that didn't do much to simplify the printk code :(

Wouldn't it be better to teach the offending iscsi code about
printk_ratelimit()?  Because I'd expect that even with this change,
undesirable things will still happen with (say) a serial console?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ