linux-kernel - Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180117121251.7283a56e@gandalf.local.home>
Date:   Wed, 17 Jan 2018 12:12:51 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Tejun Heo <tj@...nel.org>
Cc:     Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        akpm@...ux-foundation.org, linux-mm@...ck.org,
        Cong Wang <xiyou.wangcong@...il.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Mel Gorman <mgorman@...e.de>, Michal Hocko <mhocko@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Peter Zijlstra <peterz@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Jan Kara <jack@...e.cz>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        rostedt@...e.goodmis.org, Byungchul Park <byungchul.park@....com>,
        Pavel Machek <pavel@....cz>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup

On Wed, 17 Jan 2018 07:15:09 -0800
Tejun Heo <tj@...nel.org> wrote:

> It's great that Steven's patches solve a good number of problems.  It
> is also true that there's a class of problems that it doesn't solve,
> which other approaches do.  The productive thing to do here is trying
> to solve the unsolved one too, especially given that it doesn't seem
> too difficuilt to do so on top of what's proposed.

OK, let's talk about the other problems, as this is no longer related
to my patch.

>From your previous email:

> 1. Console is IPMI emulated serial console.  Super slow.  Also
>    netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
>    / driver tries to allocate memory and then fail, which in turn
>    triggers allocation failure or other warning messages.  printk was
>    already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
>    shrinking.  Because OOM handler is trapped in printk flushing, it
>    never manages to free memory and no one else can enter OOM path
>    either, so the system is trapped in this state.

>From what I gathered, you said an OOM would trigger, and then the
network console would not be able to allocate memory and it would
trigger a printk too, and cause an infinite amount of printks.

This could very well be a great place to force offloading. If a printk
is called from within a printk, at the same context (normal, softirq,
irq or NMI), then we should trigger the offloading.

My ftrace ring buffer has a context level recursion check, we could use
that, and even tie it into my previous patch:

With something like this (not compiled tested or anything, and
kick_offload_thread() would need to be implemented).

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9cb943c90d98..b80b23a0ca13 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2261,6 +2261,63 @@ static int have_callable_console(void)
 
 	return 0;
 }
+/*
+ * Used for which context the printk is in.
+ *  NMI     = 0
+ *  IRQ     = 1
+ *  SOFTIRQ = 2
+ *  NORMAL  = 3
+ *
+ * Stack ordered, where the lower number can preempt
+ * the higher number: mask &= mask - 1, will only clear
+ * the lowerest set bit.
+ */
+enum {
+	CTX_NMI,
+	CTX_IRQ,
+	CTX_SOFTIRQ,
+	CTX_NORMAL,
+};
+
+static DEFINE_PER_CPU(int, recursion_bits);
+
+static bool recursion_check_start(void)
+{
+	unsigned long pc = preempt_count();
+	int val = this_cpu_read(recursion_bits);
+
+	if (!(pc & (NMI_MASK | HARDIRQ_MASK | SOFTIRQ_OFFSET)))
+		bit = CTX_NORMAL;
+	else
+		bit = pc & NMI_MASK ? CTX_NMI :
+			pc & HARDIRQ_MASK ? CTX_IRQ : CTX_SOFTIRQ;
+
+	if (unlikely(val & (1 << bit)))
+		return true;
+
+	val |= (1 << bit);
+	this_cpu_write(recursion_bits, val);
+	return false;
+}
+
+static void recursion_check_finish(bool offload)
+{
+	int val = this_cpu_read(recursion_bits);
+
+	if (offload)
+		return;
+
+	val &= val - 1;
+	this_cpu_write(recursion_bits, val);
+}
+
+static void kick_offload_thread(void)
+{
+	/*
+	 * Consoles are triggering printks, offload the printks
+	 * to another CPU to hopefully avoid a lockup.
+	 */
+}
 
 /*
  * Can we actually use the console at this time on this cpu?
@@ -2333,6 +2390,7 @@ void console_unlock(void)
 
 	for (;;) {
 		struct printk_log *msg;
+		bool offload;
 		size_t ext_len = 0;
 		size_t len;
 
@@ -2393,15 +2451,20 @@ void console_unlock(void)
 		 * waiter waiting to take over.
 		 */
 		console_lock_spinning_enable();
+		offload = recursion_check_start();
 
 		stop_critical_timings();	/* don't trace print latency */
 		call_console_drivers(ext_text, ext_len, text, len);
 		start_critical_timings();
 
+		recursion_check_finish(offload);
+
 		if (console_lock_spinning_disable_and_check()) {
 			printk_safe_exit_irqrestore(flags);
 			return;
 		}
+		if (offload)
+			kick_offload_thread();
 
 		printk_safe_exit_irqrestore(flags);
 

-- Steve