[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0417c888-d74e-b6ae-a8f0-234cbde03d38@i-love.sakura.ne.jp>
Date: Fri, 12 Oct 2018 21:58:19 +0900
From: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Michal Hocko <mhocko@...nel.org>, linux-mm@...ck.org,
syzkaller-bugs@...glegroups.com, guro@...com,
kirill.shutemov@...ux.intel.com, linux-kernel@...r.kernel.org,
rientjes@...gle.com, yang.s@...baba-inc.com,
Andrew Morton <akpm@...ux-foundation.org>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Petr Mladek <pmladek@...e.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC PATCH] memcg, oom: throttle dump_header for memcg ooms
without eligible tasks
Calling printk() people. ;-)
On 2018/10/12 21:41, Johannes Weiner wrote:
> On Fri, Oct 12, 2018 at 09:10:40PM +0900, Tetsuo Handa wrote:
>> On 2018/10/12 21:08, Michal Hocko wrote:
>>>> So not more than 10 dumps in each 5s interval. That looks reasonable
>>>> to me. By the time it starts dropping data you have more than enough
>>>> information to go on already.
>>>
>>> Yeah. Unless we have a storm coming from many different cgroups in
>>> parallel. But even then we have the allocation context for each OOM so
>>> we are not losing everything. Should we ever tune this, it can be done
>>> later with some explicit examples.
>>>
>>>> Acked-by: Johannes Weiner <hannes@...xchg.org>
>>>
>>> Thanks! I will post the patch to Andrew early next week.
>>>
>>
>> How do you handle environments where one dump takes e.g. 3 seconds?
>> Counting delay between first message in previous dump and first message
>> in next dump is not safe. Unless we count delay between last message
>> in previous dump and first message in next dump, we cannot guarantee
>> that the system won't lockup due to printk() flooding.
>
> How is that different from any other printk ratelimiting? If a dump
> takes 3 seconds you need to fix your console. It doesn't make sense to
> design KERN_INFO messages for the slowest serial consoles out there.
You can't fix the console. It is a hardware limitation.
>
> That's what we did, btw. We used to patch out the OOM header because
> our serial console was so bad, but obviously that's not a generic
> upstream solution. We've since changed the loglevel on the serial and
> use netconsole[1] for the chattier loglevels.
>
> [1] https://github.com/facebook/fbkutils/tree/master/netconsd
>
Powered by blists - more mailing lists