linux-kernel - Re: [PATCH] fs, mm: account filp and names caches to kmemcg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20171031185039.wno4cfzgwoser4wo@dhcp22.suse.cz>
Date:   Tue, 31 Oct 2017 19:50:39 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Shakeel Butt <shakeelb@...gle.com>,
        Greg Thelen <gthelen@...gle.com>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux MM <linux-mm@...ck.org>, linux-fsdevel@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fs, mm: account filp and names caches to kmemcg

On Tue 31-10-17 12:49:59, Johannes Weiner wrote:
> On Tue, Oct 31, 2017 at 09:00:48AM +0100, Michal Hocko wrote:
> > On Mon 30-10-17 12:28:13, Shakeel Butt wrote:
> > > On Mon, Oct 30, 2017 at 1:29 AM, Michal Hocko <mhocko@...nel.org> wrote:
> > > > On Fri 27-10-17 13:50:47, Shakeel Butt wrote:
> > > >> > Why is OOM-disabling a thing? Why isn't this simply a "kill everything
> > > >> > else before you kill me"? It's crashing the kernel in trying to
> > > >> > protect a userspace application. How is that not insane?
> > > >>
> > > >> In parallel to other discussion, I think we should definitely move
> > > >> from "completely oom-disabled" semantics to something similar to "kill
> > > >> me last" semantics. Is there any objection to this idea?
> > > >
> > > > Could you be more specific what you mean?
> > > 
> > > I get the impression that the main reason behind the complexity of
> > > oom-killer is allowing processes to be protected from the oom-killer
> > > i.e. disabling oom-killing a process by setting
> > > /proc/[pid]/oom_score_adj to -1000. So, instead of oom-disabling, add
> > > an interface which will let users/admins to set a process to be
> > > oom-killed as a last resort.
> > 
> > If a process opts in to be oom disabled it needs CAP_SYS_RESOURCE and it
> > probably has a strong reason to do that. E.g. no unexpected SIGKILL
> > which could leave inconsistent data behind. We cannot simply break that
> > contract. Yes, it is a PITA configuration to support but it has its
> > reasons to exit.
> 
> I don't think that's true. The most prominent users are things like X
> and sshd, and all they wanted to say was "kill me last."

This might be the case for the desktop environment and I would tend to
agree that those can handle restart easily. I was considering
applications which need an explicit shut down and manual intervention
when not done so. Think of a database or similar.

> If sshd were to have a bug and swell up, currently the system would
> kill everything and then panic. It'd be much better to kill sshd at
> the end and let the init system restart it.
> 
> Can you describe a scenario in which the NEVERKILL semantics actually
> make sense? You're still OOM-killing the task anyway, it's not like it
> can run without the kernel. So why kill the kernel?

Yes but you start with a clean state after reboot which is rather a
different thing than restarting from an inconsistant state.

In any case I am not trying to defend this configuration! I really
dislike it and it shouldn't have ever been introduced. But it is an
established behavior for many years and I am not really willing to break
it without having a _really strong_ reason.
-- 
Michal Hocko
SUSE Labs