[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1454963703-20433-1-git-send-email-dvlasenk@redhat.com>
Date: Mon, 8 Feb 2016 21:35:03 +0100
From: Denys Vlasenko <dvlasenk@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: Denys Vlasenko <dvlasenk@...hat.com>, srostedt@...hat.com,
Steven Rostedt <rostedt@...dmis.org>,
Tejun Heo <tj@...nel.org>,
Peter Hurley <peter@...leysoftware.com>
Subject: [PATCH] printk: avoid livelock if another CPU printks continuously
At the end of each printk(), kernel attempts to take console_sem.
If this succeeds, it feeds buffered message data to console devices
until there is nothing left, and releases console_sem:
if (console_trylock_for_printk(this_cpu))
console_unlock();
The livelock exists because code in console_unlock() has no
limit on the amount of buffered data it would process under
console_sem. This is bad if printk() was called with IRQs disabled.
This patch makes console_unlock() release console_sem after 5
iterations, which usually amounts to 5 lines of printk messages,
and give other printk'ing CPUs a chance to acquire console_sem.
If some CPU grabs it, console_unlock() finishes.
If no one takes the semaphore, console_unlock() re-acquires it
and loops back for another cycle of console output.
This seems to be a hard-to-trigger, but long-existing problem:
I did not observe the bug myself. I have two separate user reports,
from two different kernels, who report hangs on boot
when a RAID driver produces a voluminous printk flood of disk
detection errors. Some of them are reported by a printk from
IRQ-off region. If that IRQ-off printk gets to print all the
other printks from all other CPUs, the machine hangs.
This patch is reported to make affected user's machine survive.
Signed-off-by: Denys Vlasenko <dvlasenk@...hat.com>
CC: linux-kernel@...r.kernel.org
CC: srostedt@...hat.com
CC: Steven Rostedt <rostedt@...dmis.org>
CC: Tejun Heo <tj@...nel.org>
CC: Peter Hurley <peter@...leysoftware.com>
---
kernel/printk/printk.c | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c963ba5..ca4f9d55 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2235,6 +2235,7 @@ void console_unlock(void)
unsigned long flags;
bool wake_klogd = false;
bool do_cond_resched, retry;
+ unsigned cnt;
if (console_suspended) {
up_console_sem();
@@ -2257,6 +2258,7 @@ void console_unlock(void)
/* flush buffered message fragment immediately to console */
console_cont_flush(text, sizeof(text));
again:
+ cnt = 5;
for (;;) {
struct printk_log *msg;
size_t ext_len = 0;
@@ -2284,6 +2286,9 @@ skip:
if (console_seq == log_next_seq)
break;
+ if (--cnt == 0)
+ break; /* Someone else printk's like crazy */
+
msg = log_from_idx(console_idx);
if (msg->flags & LOG_NOCONS) {
/*
@@ -2350,6 +2355,26 @@ skip:
if (retry && console_trylock())
goto again;
+ if (cnt == 0) {
+ /*
+ * Other CPU(s) printk like crazy, filling log_buf[].
+ * Try to get rid of the "honor" of servicing their data:
+ * give _them_ time to grab console_sem and start working.
+ */
+ cnt = 9999;
+ while (--cnt != 0) {
+ cpu_relax();
+ if (console_seq == log_next_seq) {
+ /* Good, other CPU entered "for(;;)" loop */
+ goto out;
+ }
+ }
+ /* No one seems to be willing to take it... */
+ if (console_trylock())
+ goto again; /* we took it */
+ /* Nope, someone else holds console_sem! Good */
+ }
+out:
if (wake_klogd)
wake_up_klogd();
}
--
1.8.1.4
Powered by blists - more mailing lists