lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 18 Oct 2018 15:10:18 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Michal Hocko <mhocko@...nel.org>
Cc:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Johannes Weiner <hannes@...xchg.org>, linux-mm@...ck.org,
        syzkaller-bugs@...glegroups.com, guro@...com,
        kirill.shutemov@...ux.intel.com, linux-kernel@...r.kernel.org,
        rientjes@...gle.com, yang.s@...baba-inc.com,
        Andrew Morton <akpm@...ux-foundation.org>,
        Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        syzbot <syzbot+77e6b28a7a7106ad0def@...kaller.appspotmail.com>
Subject: Re: [PATCH v3] mm: memcontrol: Don't flood OOM messages with no
 eligible task.

On (10/18/18 14:26), Tetsuo Handa wrote:
> Sergey Senozhatsky wrote:
> > To my personal taste, "baud rate of registered and enabled consoles"
> > approach is drastically more relevant than hard coded 10 * HZ or
> > 60 * HZ magic numbers... But not in the form of that "min baud rate"
> > brain fart, which I have posted.
> 
> I'm saying that my 60 * HZ is "duration which the OOM killer keeps refraining
>  from calling printk()". Such period is required for allowing console users
> to do their operations without being disturbed by the OOM killer.
> 

Got you. I'm probably not paying too much attention to this discussion.
You start your commit message with "RCU stalls" and end with a compleely
different problem "admin interaction". I skipped the last part of the
commit message.

OK. That makes sense if any user intervention/interaction actually happens.
I'm not sure that someone at facebook or google logins to every server
that is under OOM to do something critically important there. Net console
logs and postmortem analysis, *perhaps*, would be their choice. I believe
it was Johannes who said that his net console is capable of keeping up
with the traffic and that 60 * HZ is too long for him. So I can see why
people might not be happy with your patch. I don't think that 60 * HZ
enforcement will go anywhere.

Now, if your problem is
     "I'm actually logged in, and want to do something
      sane, how do I stop this OOM report flood because
      it wipes out everything I have on my console?"

then let's formulate it as
     "I'm actually logged in, and want to do something
      sane, how do I stop this OOM report flood because
      it wipes out everything I have on my console?"

and let's hear from MM people what they can suggest.

Michal, Andrew, Johannes, any thoughts?

For instance,
   change /proc/sys/kernel/printk and suppress most of the warnings?

   // not only OOM but possibly other printk()-s that can come from
   // different CPUs

If your problem is "syzbot hits RCU stalls" then let's have a baud rate
based ratelimiting; I think we can get more or less reasonable timeout
values.

	-ss

Powered by blists - more mailing lists