lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171221231932.27727fab@vmware.local.home>
Date:   Thu, 21 Dec 2017 23:19:32 -0500
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Tejun Heo <tj@...nel.org>
Cc:     Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Jan Kara <jack@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Rafael Wysocki <rjw@...ysocki.net>,
        Pavel Machek <pavel@....cz>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        linux-kernel@...r.kernel.org,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

On Thu, 21 Dec 2017 16:09:32 -0800
Tejun Heo <tj@...nel.org> wrote:
> 
> I tried your v4 patch and ran the test module and could easily
> reproduce RCU stall and other issues stemming from a CPU getting
> pegged down by printk flushing.  I'm attaching the test module code at
> the end.

Thanks, I'll take a look.

> 
> I wrote this before but this isn't a theoretical problem.  We see
> these stalls a lot.  Preemption isn't enabled to begin with.  Memory
> pressure is high and OOM triggers and printk starts printing out OOM
> warnings; then, a network packet comes in which triggers allocations
> in the network layer, which fails due to memory pressure, which then
> generates memory allocation failure messages which then generates
> netconsole packets which then tries to allocate more memory and so on.

It doesn't matter if preemption is enabled or not. The hand off should
happen either way.


> 
> It's just that there's no one else to give that flushing duty too, so
> the pingpoinging that your patch implements can't really help
> anything.
> 
> You argue that it isn't made worse by your patch, which may be true
> but your patch doesn't solve actual problems and is most likely
> unnecessary complication which gets in the way for the actual
> solution.  It's a weird argument to push an approach which is
> fundamentally broken.  Let's please stop that.

BULLSHIT!

It's not a complex solution, and coming from the cgroup and workqueue
maintainer, that's a pretty ironic comment.

It has already been proven that it can solve problems:

  http://lkml.kernel.org/r/20171219143147.GB15210@dhcp22.suse.cz

You don't think handing off printks to an offloaded thread isn't more
complex nor can it cause even more issues (like more likely to lose
relevant information on kernel crashes)?

> 
> Thanks.
> 
> #include <linux/module.h>
> #include <linux/delay.h>
> #include <linux/sched.h>
> #include <linux/mutex.h>
> #include <linux/workqueue.h>
> #include <linux/hrtimer.h>
> 
> static bool in_printk;
> static bool stop_testing;
> static struct hrtimer printk_timer;
> static ktime_t timer_interval;
> 
> static enum hrtimer_restart printk_timerfn(struct hrtimer *timer)
> {
> 	int i;
> 
> 	if (READ_ONCE(in_printk))
> 		for (i = 0; i < 10000; i++)
> 			printk("%-80s\n", "XXX TIMER");

WTF!

You are printing 10,000 printk messages from an interrupt context???
And to top it off, I ran this on my box, switching printk() to
trace_printk() (which is extremely low overhead). And it is triggered
on the same CPU that did the printk() itself on. Yeah, there is no hand
off, because you are doing a shitload of printks on one CPU and nothing
on any of the other CPUs. This isn't the problem that my patch was set
out to solve, nor is it a very realistic problem. I added a counter to
the printk as well, to keep track of how many printks there were:

 # trace-cmd record -e printk -e irq

kworker/-1603    0...1  1571.783182: print:                (nil)s: start 0
kworker/-1603    0d..1  1571.783189: console:              [ 1571.540656] XXX PREEMPT                                                                     
kworker/-1603    0d.h1  1571.791953: softirq_raise:        vec=1 [action=TIMER]
kworker/-1603    0d.h1  1571.791953: softirq_raise:        vec=9 [action=RCU]
kworker/-1603    0d.h1  1571.791957: softirq_raise:        vec=7 [action=SCHED]
kworker/-1603    0d.h1  1571.791959: print:                (nil)s: XXX TIMER                                                                        (0)
kworker/-1603    0d.h1  1571.791960: print:                (nil)s: XXX TIMER                                                                        (1)


Let's look at the above. My trace_printk() is the "start 0" which is
the first iteration of the 100 printk()s you are doing. The "console"
is the printk() tracepoint of your worker thread printk.

The other "print:" events are my trace_printk()s that replaced the
printk() from your interrupt handler. I added a counter to see which
iteration it is. Note, trace_printk() is much faster than printk()
writing to the log buffer (let alone writing to the console).


kworker/-1603    0d.h1  1571.791960: print:                (nil)s: XXX TIMER                                                                        (2)
kworker/-1603    0d.h1  1571.791961: print:                (nil)s: XXX TIMER                                                                        (3)
kworker/-1603    0d.h1  1571.791961: print:                (nil)s: XXX TIMER                                                                        (4)
kworker/-1603    0d.h1  1571.791962: print:                (nil)s: XXX TIMER                                                                        (5)

[..]

kworker/-1603    0d.h1  1571.794473: print:                (nil)s: XXX TIMER                                                                        (9992)
kworker/-1603    0d.h1  1571.794474: print:                (nil)s: XXX TIMER                                                                        (9993)
kworker/-1603    0d.h1  1571.794474: print:                (nil)s: XXX TIMER                                                                        (9994)
kworker/-1603    0d.h1  1571.794474: print:                (nil)s: XXX TIMER                                                                        (9995)
kworker/-1603    0d.h1  1571.794474: print:                (nil)s: XXX TIMER                                                                        (9996)
kworker/-1603    0d.h1  1571.794475: print:                (nil)s: XXX TIMER                                                                        (9997)
kworker/-1603    0d.h1  1571.794475: print:                (nil)s: XXX TIMER                                                                        (9998)
kworker/-1603    0d.h1  1571.794475: print:                (nil)s: XXX TIMER                                                                        (9999)
kworker/-1603    0d.h1  1571.794477: softirq_raise:        vec=1 [action=TIMER]
kworker/-1603    0d.h1  1571.794477: softirq_raise:        vec=9 [action=RCU]
kworker/-1603    0d.h1  1571.794478: softirq_raise:        vec=7 [action=SCHED]
kworker/-1603    0..s1  1571.794481: softirq_entry:        vec=1 [action=TIMER]
kworker/-1603    0..s1  1571.794482: softirq_exit:         vec=1 [action=TIMER]
kworker/-1603    0..s1  1571.794482: softirq_entry:        vec=7 [action=SCHED]
kworker/-1603    0..s1  1571.794484: softirq_exit:         vec=7 [action=SCHED]
kworker/-1603    0..s1  1571.794484: softirq_entry:        vec=9 [action=RCU]
kworker/-1603    0..s1  1571.794494: softirq_exit:         vec=9 [action=RCU]
kworker/-1603    0...1  1571.794495: print:                (nil)s: end 0

Here we finish the first printk() from the worker thread. The print
started at 1571.783182 and finished at 1571.794495. That's over 11
milliseconds! And with preemption disabled. No offloading this to
another printk thread will fix that. And this printing isn't even done
yet. This is the first of 100 loops!

Let's look at the interrupt too. Remember, trace_printk() is faster
than printk() writing to its log buffers. No lock is needed as
trace_printk() is per_cpu and lockless. That said, your example started
at 1571.791953 and finished at 1571.794494 from interrupt context.
That's 2.5 milliseconds! Worse yet, this is repeated for 99 more
times!!!


kworker/-1603    0...1  1571.794495: print:                (nil)s: start 1
kworker/-1603    0d..1  1571.794496: console:              [ 1571.551967] XXX PREEMPT                                                                     
kworker/-1603    0d.h1  1571.803089: softirq_raise:        vec=1 [action=TIMER]
kworker/-1603    0d.h1  1571.803089: softirq_raise:        vec=9 [action=RCU]
kworker/-1603    0d.h1  1571.803094: print:                (nil)s: XXX TIMER                                                                        (0)
kworker/-1603    0d.h1  1571.803095: print:                (nil)s: XXX TIMER                                                                        (1)
kworker/-1603    0d.h1  1571.803095: print:                (nil)s: XXX TIMER                                                                        (2)
kworker/-1603    0d.h1  1571.803095: print:                (nil)s: XXX TIMER                                                                        (3)
kworker/-1603    0d.h1  1571.803095: print:                (nil)s: XXX TIMER                                                                        (4)

[...]

kworker/-1603    0dNh1  1572.860286: print:                (nil)s: XXX TIMER                                                                        (9996)
kworker/-1603    0dNh1  1572.860286: print:                (nil)s: XXX TIMER                                                                        (9997)
kworker/-1603    0dNh1  1572.860287: print:                (nil)s: XXX TIMER                                                                        (9998)
kworker/-1603    0dNh1  1572.860287: print:                (nil)s: XXX TIMER                                                                        (9999)
kworker/-1603    0dNh1  1572.860288: softirq_raise:        vec=1 [action=TIMER]
kworker/-1603    0dNh1  1572.860288: softirq_raise:        vec=9 [action=RCU]
kworker/-1603    0dNh1  1572.860289: softirq_raise:        vec=7 [action=SCHED]
kworker/-1603    0.N.1  1572.860291: print:                (nil)s: end 99

Now I find the last loop, and here we are at 1572.860291, which is over
1 second! from the time it started. I did have a serial console set up
for this test, so the printks were slow. Let me try this with no serial
console...

It does fair better...

It got to 19 prints before the interrupt triggered:

kworker/-196     0...1    70.474589: print:                (nil)s: end 18
kworker/-196     0...1    70.474597: print:                (nil)s: start 19
kworker/-196     0d..1    70.474599: console:              [   70.472301] XXX PREEMPT                                                                     
kworker/-196     0d.h1    70.474646: print:                (nil)s: XXX TIMER                                                                        (0)
kworker/-196     0d.h1    70.474646: print:                (nil)s: XXX TIMER                                                                        (1)

But 10,000 printks in interrupt context is still a very long time. That
doesn't improve.

kworker/-196     0d.h1    70.479854: print:                (nil)s: XXX TIMER                                                                        (9997)
kworker/-196     0d.h1    70.479855: print:                (nil)s: XXX TIMER                                                                        (9998)
kworker/-196     0d.h1    70.479855: print:                (nil)s: XXX TIMER                                                                        (9999)
kworker/-196     0d.h1    70.479858: softirq_raise:        vec=1 [action=TIMER]
kworker/-196     0d.h1    70.479859: softirq_raise:        vec=9 [action=RCU]
kworker/-196     0d.H1    70.479867: irq_handler_entry:    irq=27 name=em1
kworker/-196     0d.H1    70.479871: softirq_raise:        vec=3 [action=NET_RX]
kworker/-196     0d.H1    70.479871: irq_handler_exit:     irq=27 ret=handled
kworker/-196     0..s1    70.479872: softirq_entry:        vec=1 [action=TIMER]
kworker/-196     0..s1    70.479874: softirq_exit:         vec=1 [action=TIMER]
kworker/-196     0..s1    70.479874: softirq_entry:        vec=9 [action=RCU]
kworker/-196     0..s1    70.479875: softirq_exit:         vec=9 [action=RCU]
kworker/-196     0..s1    70.479876: softirq_entry:        vec=3 [action=NET_RX]
kworker/-196     0..s1    70.479970: softirq_exit:         vec=3 [action=NET_RX]
kworker/-196     0...1    70.479972: print:                (nil)s: end 19

Interrupt context started at 70.474646 and ended at 70.479970. For some
reason, this one took over 5 milliseconds in interrupt context!

Looking at the next interrupt, which happened at 29:

kworker/-196     0...1    70.480300: print:                (nil)s: end 28
kworker/-196     0...1    70.480300: print:                (nil)s: start 29
kworker/-196     0d..1    70.480301: console:              [   70.478004] XXX PREEMPT                                                                     
kworker/-196     0d.h1    70.480830: print:                (nil)s: XXX TIMER                                                                        (0)
[..]
kworker/-196     0d.h1    70.484455: print:                (nil)s: XXX TIMER                                                                        (9998)
kworker/-196     0d.h1    70.484455: print:                (nil)s: XXX TIMER                                                                        (9999)
kworker/-196     0d.h1    70.484456: softirq_raise:        vec=1 [action=TIMER]
kworker/-196     0d.h1    70.484456: softirq_raise:        vec=9 [action=RCU]
kworker/-196     0d.h1    70.484457: softirq_raise:        vec=7 [action=SCHED]
kworker/-196     0d.H1    70.484458: irq_handler_entry:    irq=25 name=ahci[0000:00:1f.2]
kworker/-196     0d.H2    70.484466: softirq_raise:        vec=4 [action=BLOCK]
kworker/-196     0d.H1    70.484466: irq_handler_exit:     irq=25 ret=handled
kworker/-196     0..s1    70.484466: softirq_entry:        vec=1 [action=TIMER]
kworker/-196     0..s1    70.484467: softirq_exit:         vec=1 [action=TIMER]
kworker/-196     0..s1    70.484467: softirq_entry:        vec=7 [action=SCHED]
kworker/-196     0..s1    70.484469: softirq_exit:         vec=7 [action=SCHED]
kworker/-196     0..s1    70.484469: softirq_entry:        vec=9 [action=RCU]
kworker/-196     0d.s1    70.484484: softirq_raise:        vec=9 [action=RCU]
kworker/-196     0..s1    70.484484: softirq_exit:         vec=9 [action=RCU]
kworker/-196     0..s1    70.484484: softirq_entry:        vec=4 [action=BLOCK]
kworker/-196     0.Ns1    70.484518: softirq_exit:         vec=4 [action=BLOCK]
kworker/-196     0.Ns1    70.484519: softirq_entry:        vec=9 [action=RCU]
kworker/-196     0dNs1    70.484532: softirq_raise:        vec=9 [action=RCU]
kworker/-196     0.Ns1    70.484532: softirq_exit:         vec=9 [action=RCU]
kworker/-196     0.N.1    70.484534: print:                (nil)s: end 29

It took 3 milliseconds. Of course I'm also counting the softirqs that
need to catch up.

Finally, it ends at:

kworker/-196     0dNh1    70.525877: print:                (nil)s: XXX TIMER                                                                        (9997)
kworker/-196     0dNh1    70.525877: print:                (nil)s: XXX TIMER                                                                        (9998)
kworker/-196     0dNh1    70.525878: print:                (nil)s: XXX TIMER                                                                        (9999)
kworker/-196     0dNh1    70.525879: softirq_raise:        vec=1 [action=TIMER]
kworker/-196     0dNh1    70.525879: softirq_raise:        vec=9 [action=RCU]
kworker/-196     0.N.1    70.525880: print:                (nil)s: end 98
kworker/-196     0.N.1    70.525880: print:                (nil)s: start 99
kworker/-196     0dN.1    70.525881: console:              [   70.523584] XXX PREEMPT                                                                     
kworker/-196     0.N.1    70.525998: print:                (nil)s: end 99

With the start at:

kworker/-196     0...1    70.473685: print:                (nil)s: start 1

So, without a slow console, we only had 52 milliseconds with preemption
disabled. As this is all happening only on a single CPU, and I haven't
seen issues where one CPU caused problems except for doing sysrq-t and
other debugging problems. I highly doubt any OOMs could trigger this.
If OOM started to spam the machine, it would most certainly do it on
multiple CPUs, where my patch would indeed have an effect.

This is a very extreme example (of how not to write code), and I believe
it is totally unrealistic. PLEASE STOP THAT.

My version of the module I used is at the bottom of this email.

-- Steve

> 
> 	hrtimer_forward_now(&printk_timer, timer_interval);
> 	return READ_ONCE(stop_testing) ? HRTIMER_NORESTART : HRTIMER_RESTART;
> }	
> 
> static void preempt_printk_workfn(struct work_struct *work)
> {
> 	int i;
> 
> 	hrtimer_init(&printk_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> 	printk_timer.function = printk_timerfn;
> 	timer_interval = ktime_set(0, NSEC_PER_MSEC);
> 	hrtimer_start(&printk_timer, timer_interval, HRTIMER_MODE_REL);
> 
> 	while (!READ_ONCE(stop_testing)) {
> 		preempt_disable();
> 		WRITE_ONCE(in_printk, true);
> 		for (i = 0; i < 100; i++)
> 			printk("%-80s\n", "XXX PREEMPT");
> 		WRITE_ONCE(in_printk, false);
> 		preempt_enable();
> 		msleep(1);
> 	}
> }
> static DECLARE_WORK(preempt_printk_work, preempt_printk_workfn);
> 
> static int __init test_init(void)
> {
> 	queue_work_on(0, system_wq, &preempt_printk_work);
> 	return 0;
> }
> 
> static void __exit test_exit(void)
> {
> 	WRITE_ONCE(stop_testing, true);
> 	flush_work(&preempt_printk_work);
> 	hrtimer_cancel(&printk_timer);
> }
> 
> module_init(test_init);
> module_exit(test_exit);
> MODULE_LICENSE("GPL");

#include <linux/module.h>
#include <linux/delay.h>
#include <linux/sched.h>
#include <linux/mutex.h>
#include <linux/workqueue.h>
#include <linux/hrtimer.h>

static bool in_printk;
static bool stop_testing;
static struct hrtimer printk_timer;
static ktime_t timer_interval;

static enum hrtimer_restart printk_timerfn(struct hrtimer *timer)
{
	int i;

	if (READ_ONCE(in_printk))
		for (i = 0; i < 10000; i++)
			__trace_printk(0,"%-80s (%d)\n", "XXX TIMER",i);

	hrtimer_forward_now(&printk_timer, timer_interval);
	return READ_ONCE(stop_testing) ? HRTIMER_NORESTART : HRTIMER_RESTART;
}	

static void preempt_printk_workfn(struct work_struct *work)
{
	int i;

	hrtimer_init(&printk_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
	printk_timer.function = printk_timerfn;
	timer_interval = ktime_set(0, NSEC_PER_MSEC);
	hrtimer_start(&printk_timer, timer_interval, HRTIMER_MODE_REL);

	while (!READ_ONCE(stop_testing)) {
		preempt_disable();
		WRITE_ONCE(in_printk, true);
		for (i = 0; i < 100; i++) {
			__trace_printk(0,"start %i\n", i);
			printk("%-80s\n", "XXX PREEMPT");
			__trace_printk(0,"end %i\n", i);
		}
		WRITE_ONCE(in_printk, false);
		preempt_enable();
		msleep(1);
	}
}
static DECLARE_WORK(preempt_printk_work, preempt_printk_workfn);

static void finish(void)
{
	WRITE_ONCE(stop_testing, true);
	flush_work(&preempt_printk_work);
	hrtimer_cancel(&printk_timer);
}
static int __init test_init(void)
{
	queue_work_on(0, system_wq, &preempt_printk_work);

	return 0;
}

static void __exit test_exit(void)
{
	finish();
}

module_init(test_init);
module_exit(test_exit);
MODULE_LICENSE("GPL");

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ