lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181012124137.GA29330@cmpxchg.org>
Date:   Fri, 12 Oct 2018 08:41:37 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     Michal Hocko <mhocko@...nel.org>, linux-mm@...ck.org,
        syzkaller-bugs@...glegroups.com, guro@...com,
        kirill.shutemov@...ux.intel.com, linux-kernel@...r.kernel.org,
        rientjes@...gle.com, yang.s@...baba-inc.com,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC PATCH] memcg, oom: throttle dump_header for memcg ooms
 without eligible tasks

On Fri, Oct 12, 2018 at 09:10:40PM +0900, Tetsuo Handa wrote:
> On 2018/10/12 21:08, Michal Hocko wrote:
> >> So not more than 10 dumps in each 5s interval. That looks reasonable
> >> to me. By the time it starts dropping data you have more than enough
> >> information to go on already.
> > 
> > Yeah. Unless we have a storm coming from many different cgroups in
> > parallel. But even then we have the allocation context for each OOM so
> > we are not losing everything. Should we ever tune this, it can be done
> > later with some explicit examples.
> > 
> >> Acked-by: Johannes Weiner <hannes@...xchg.org>
> > 
> > Thanks! I will post the patch to Andrew early next week.
> > 
> 
> How do you handle environments where one dump takes e.g. 3 seconds?
> Counting delay between first message in previous dump and first message
> in next dump is not safe. Unless we count delay between last message
> in previous dump and first message in next dump, we cannot guarantee
> that the system won't lockup due to printk() flooding.

How is that different from any other printk ratelimiting? If a dump
takes 3 seconds you need to fix your console. It doesn't make sense to
design KERN_INFO messages for the slowest serial consoles out there.

That's what we did, btw. We used to patch out the OOM header because
our serial console was so bad, but obviously that's not a generic
upstream solution. We've since changed the loglevel on the serial and
use netconsole[1] for the chattier loglevels.

[1] https://github.com/facebook/fbkutils/tree/master/netconsd

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ