[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160331111229.GB1023@pathway.suse.cz>
Date: Thu, 31 Mar 2016 13:12:29 +0200
From: Petr Mladek <pmladek@...e.com>
To: Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Jan Kara <jack@...e.com>, Tejun Heo <tj@...nel.org>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
linux-kernel@...r.kernel.org,
Byungchul Park <byungchul.park@....com>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Jan Kara <jack@...e.cz>
Subject: Re: [RFC][PATCH v8 1/2] printk: Make printk() completely async
On Fri 2016-03-25 00:43:22, Sergey Senozhatsky wrote:
> From: Jan Kara <jack@...e.cz>
>
> Currently, printk() sometimes waits for message to be printed to console
> and sometimes it does not (when console_sem is held by some other
> process). In case printk() grabs console_sem and starts printing to
> console, it prints messages from kernel printk buffer until the buffer
> is empty. When serial console is attached, printing is slow and thus
> other CPUs in the system have plenty of time to append new messages to
> the buffer while one CPU is printing. Thus the CPU can spend unbounded
> amount of time doing printing in console_unlock(). This is especially
> serious problem if the printk() calling console_unlock() was called with
> interrupts disabled.
>
> In practice users have observed a CPU can spend tens of seconds printing
> in console_unlock() (usually during boot when hundreds of SCSI devices
> are discovered) resulting in RCU stalls (CPU doing printing doesn't
> reach quiescent state for a long time), softlockup reports (IPIs for the
> printing CPU don't get served and thus other CPUs are spinning waiting
> for the printing CPU to process IPIs), and eventually a machine death
> (as messages from stalls and lockups append to printk buffer faster than
> we are able to print). So these machines are unable to boot with serial
> console attached. Another observed issue is that due to slow printk,
> hardware discovery is slow and udev times out before kernel manages to
> discover all the attached HW. Also during artificial stress testing SATA
> disk disappears from the system because its interrupts aren't served for
> too long.
>
> This patch makes printk() completely asynchronous (similar to what
> printk_deferred() did until now). It appends message to the kernel
> printk buffer and wake_up()s a special dedicated kthread to do the
> printing to console. This has the advantage that printing always happens
> from a schedulable contex and thus we don't lockup any particular CPU or
> even interrupts. Also it has the advantage that printk() is fast and
> thus kernel booting is not slowed down by slow serial console.
> Disadvantage of this method is that in case of crash there is higher
> chance that important messages won't appear in console output (we may
> need working scheduling to print message to console). We somewhat
> mitigate this risk by switching printk to the original method of
> immediate printing to console if oops is in progress. Also for
> debugging purposes we provide printk.synchronous kernel parameter which
> resorts to the original printk behavior.
>
> printk() is expected to work under different conditions and in different
> scenarios, including corner cases of OOM when all of the workers are busy
> (e.g. allocating memory), thus printk() uses its own dedicated printing
> kthread, rather than relying on workqueue (even with WQ_MEM_RECLAIM bit
> set we potentially can receive delays in printing until workqueue
> declares a ->mayday, as noted by Tetsuo Handa).
>
> Signed-off-by: Jan Kara <jack@...e.cz>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@...il.com>
> ---
> Documentation/kernel-parameters.txt | 10 ++++
> kernel/printk/printk.c | 91 ++++++++++++++++++++++++++++++++++---
> 2 files changed, 94 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index ecc74fa..4745e94 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3114,6 +3114,16 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> printk.time= Show timing data prefixed to each printk message line
> Format: <bool> (1/Y/y=enable, 0/N/n=disable)
>
> + printk.synchronous=
> + By default kernel messages are printed to console
> + asynchronously (except during early boot or when oops
> + is happening). That avoids kernel stalling behind slow
> + serial console and thus avoids softlockups, interrupt
> + timeouts, or userspace timing out during heavy printing.
> + However for debugging problems, printing messages to
> + console immediately may be desirable. This option
> + enables such behavior.
> +
> processor.max_cstate= [HW,ACPI]
> Limit processor to maximum C-state
> max_cstate=9 overrides any DMI blacklist limit.
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index bfbf284..a984c5a 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -46,6 +46,7 @@
> #include <linux/utsname.h>
> #include <linux/ctype.h>
> #include <linux/uio.h>
> +#include <linux/kthread.h>
>
> #include <asm/uaccess.h>
> #include <asm-generic/sections.h>
> @@ -284,6 +285,19 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
> static char *log_buf = __log_buf;
> static u32 log_buf_len = __LOG_BUF_LEN;
>
> +/*
> + * When true, printing to console will happen synchronously.
> + * The default value on UP systems is 'true'.
> + */
> +static bool __read_mostly printk_sync = !IS_ENABLED(CONFIG_SMP);
> +module_param_named(synchronous, printk_sync, bool, S_IRUGO);
> +MODULE_PARM_DESC(synchronous, "make printing to console synchronous");
> +
> +/* Printing kthread for async printk */
> +static struct task_struct *printk_kthread;
> +/* When `true' - printing thread has messages to print */
> +static bool need_flush_console;
> +
> /* Return log buffer address */
> char *log_buf_addr_get(void)
> {
> @@ -1608,6 +1622,8 @@ asmlinkage int vprintk_emit(int facility, int level,
> const char *dict, size_t dictlen,
> const char *fmt, va_list args)
> {
> + /* cpu currently holding logbuf_lock in this function */
> + static unsigned int logbuf_cpu = UINT_MAX;
> static bool recursion_bug;
> static char textbuf[LOG_LINE_MAX];
> char *text = textbuf;
> @@ -1617,8 +1633,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> int this_cpu;
> int printed_len = 0;
> bool in_sched = false;
> - /* cpu currently holding logbuf_lock in this function */
> - static unsigned int logbuf_cpu = UINT_MAX;
> + bool in_panic = console_loglevel == CONSOLE_LOGLEVEL_MOTORMOUTH;
>
> if (level == LOGLEVEL_SCHED) {
> level = LOGLEVEL_DEFAULT;
> @@ -1655,6 +1670,14 @@ asmlinkage int vprintk_emit(int facility, int level,
> raw_spin_lock(&logbuf_lock);
> logbuf_cpu = this_cpu;
>
> + /*
> + * Set printing kthread sleep condition early, under the
> + * logbuf_lock, so it (if RUNNING) will go to console_lock()
> + * and spin on logbuf_lock.
> + */
> + if (!in_panic && printk_kthread && !need_flush_console)
> + need_flush_console = true;
I would remove the if-statement and always set it:
+ It does not matter if we set it in panic. It will not affect
anything.
+ The check for printk_kthread is racy. It might be false here
and it might be true later when check whether to wakeup
the kthread or try to get console directly.
+ We might set it true even when it was true before.
I think that this was an attempt to avoid a spurious wake up.
But we solve it better way now.
> if (unlikely(recursion_bug)) {
> static const char recursion_msg[] =
> "BUG: recent printk recursion!";
> @@ -1757,12 +1780,28 @@ asmlinkage int vprintk_emit(int facility, int level,
> if (!in_sched) {
> lockdep_off();
> /*
> - * Try to acquire and then immediately release the console
> - * semaphore. The release will print out buffers and wake up
> - * /dev/kmsg and syslog() users.
> + * By default we print message to console asynchronously so
> + * that kernel doesn't get stalled due to slow serial console.
> + * That can lead to softlockups, lost interrupts, or userspace
> + * timing out under heavy printing load.
> + *
> + * However we resort to synchronous printing of messages during
> + * early boot, when synchronous printing was explicitly
> + * requested by kernel parameter, or when console_verbose() was
> + * called to print everything during panic / oops.
> */
> - if (console_trylock())
> - console_unlock();
> + if (!in_panic && printk_kthread) {
> + /* Offload printing to a schedulable context. */
> + wake_up_process(printk_kthread);
> + } else {
> + /*
> + * Try to acquire and then immediately release the
> + * console semaphore. The release will print out
> + * buffers and wake up /dev/kmsg and syslog() users.
> + */
> + if (console_trylock())
> + console_unlock();
> + }
> lockdep_on();
> }
>
> @@ -1899,6 +1938,8 @@ static size_t cont_print_text(char *text, size_t size) { return 0; }
> /* Still needs to be defined for users */
> DEFINE_PER_CPU(printk_func_t, printk_func);
>
> +static struct task_struct *printk_kthread;
> +static bool need_flush_console;
> #endif /* CONFIG_PRINTK */
>
> #ifdef CONFIG_EARLY_PRINTK
> @@ -2337,6 +2378,8 @@ skip:
> */
> raw_spin_lock(&logbuf_lock);
> retry = console_seq != log_next_seq;
> + if (!retry && printk_kthread)
> + need_flush_console = false;
Similar here. I remove the if-statement and always set it. We will
either retry or it should be false anyway.
> raw_spin_unlock_irqrestore(&logbuf_lock, flags);
>
> if (retry && console_trylock())
> @@ -2722,6 +2765,40 @@ static int __init printk_late_init(void)
> late_initcall(printk_late_init);
>
> #if defined CONFIG_PRINTK
> +static int printk_kthread_func(void *data)
> +{
> + while (1) {
> + set_current_state(TASK_INTERRUPTIBLE);
> + if (!need_flush_console)
> + schedule();
> +
> + __set_current_state(TASK_RUNNING);
We still must do here:
need_flush_console = false;
Othrerwise, we might start "busy" cycling. cosole_unlock()
sometimes returns earlly, e.g. when console_suspended is set
or !can_use_console() returns true.
Sigh, the handling of "need_flush_console" is a bit strange.
Part of the logic depends on logbuf_lock and the other part
must be lockless.
I think that this lockless assigment deserve a comment, e.g.
/*
* Avoid an infinite loop when console_unlock() cannot
* access consoles, e.g. because of a suspend. We
* could get asleep here. Someone else will call
* consoles if conditions change.
*/
Also another name might help. If we set it false here, the value
does describe a global state. The variable describes if this
kthread needs to flush the console. So, more precise name would be
printk_kthread_need_flush_console
I think that we are close. I really like the current state of
the patch and how minimalistic it is.
Best Regards,
Petr
Powered by blists - more mailing lists