[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20180509085822.GB353@jagdpanzerIV>
Date: Wed, 9 May 2018 17:58:22 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Tejun Heo <tj@...nel.org>, Petr Mladek <pmladek@...e.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Steven Rostedt <rostedt@...dmis.org>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>
Cc: Petr Mladek <pmladek@...e.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
linux-mm@...ck.org, Cong Wang <xiyou.wangcong@...il.com>,
Dave Hansen <dave.hansen@...el.com>,
Mel Gorman <mgorman@...e.de>,
Peter Zijlstra <peterz@...radead.org>, Jan Kara <jack@...e.cz>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
rostedt@...e.goodmis.org, Byungchul Park <byungchul.park@....com>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Pavel Machek <pavel@....cz>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup
Hi,
Move printk and (some of) MM people to the recipients list.
On (01/10/18 09:02), Tejun Heo wrote:
[..]
> The particular case that we've been seeing regularly in the fleet was
> the following scenario.
>
> 1. Console is IPMI emulated serial console. Super slow. Also
> netconsole is in use.
> 2. System runs out of memory, OOM triggers.
> 3. OOM handler is printing out OOM debug info.
> 4. While trying to emit the messages for netconsole, the network stack
> / driver tries to allocate memory and then fail, which in turn
> triggers allocation failure or other warning messages. printk was
> already flushing, so the messages are queued on the ring.
> 5. OOM handler keeps flushing but 4 repeats and the queue is never
> shrinking. Because OOM handler is trapped in printk flushing, it
> never manages to free memory and no one else can enter OOM path
> either, so the system is trapped in this state.
Tejun, we have a theory [since there are no logs available] that what
you are looking at is something as follows:
console_unlock()
{
for (;;) {
call_console_drivers()
kmalloc()/etc /* netconsole, skb kmalloc(), for instance */
__alloc_pages_slowpath()
warn_alloc() /* a bunch of printk() -> log_store() */
}
}
Now, warn_alloc() is rate limited to
DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST
so net console driver can add 10 warn_alloc() reports every 5 seconds to
the logbuf.
You have a "super slow" IPMI console and net console. So for every
logbuf entry we do:
console_unlock()
{
for (;;) {
call_console_drivers(msg) -> IPMI_write()
call_console_drivers(msg) -> netconsole_write() -> skb kmalloc() -> warn_alloc() -> ratelimit
}
}
IPMI_write() is very slow, as you have noted, so it consumes time
printing messages, simultaneously warn_alloc() rate limit depends on
time. *Probably*, slow IPMI_write() is unable to flush 10 warn_alloc()
reports under 5 seconds, which gives net console a chance to add another
10 warn_alloc()-s, while the previous 10 warn_alloc()-s have not been
flushed yet.
It seems that DEFAULT_RATELIMIT_INTERVAL / DEFAULT_RATELIMIT_BURST
warn_alloc() rate limit is too permissive for your setup.
Can you confirm that the theory is actually correct?
If it is correct, then can we simply tweak warn_alloc() rate limit?
Say, make it x2 / x4 / etc. times less verbose? E.g. "up to 5 warn_alloc()-s
every 10 seconds"? What do MM folks think?
-ss
Powered by blists - more mailing lists