linux-kernel - Re: [PATCH -mm] memcg: do not trigger OOM from add_to_page_cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20121126192941.GC2301@cmpxchg.org>
Date:	Mon, 26 Nov 2012 14:29:41 -0500
From:	Johannes Weiner <hannes@...xchg.org>
To:	Michal Hocko <mhocko@...e.cz>
Cc:	azurIt <azurit@...ox.sk>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, cgroups mailinglist <cgroups@...r.kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [PATCH -mm] memcg: do not trigger OOM from
 add_to_page_cache_locked

On Mon, Nov 26, 2012 at 08:03:29PM +0100, Michal Hocko wrote:
> On Mon 26-11-12 13:24:21, Johannes Weiner wrote:
> > On Mon, Nov 26, 2012 at 07:04:44PM +0100, Michal Hocko wrote:
> > > On Mon 26-11-12 12:46:22, Johannes Weiner wrote:
> [...]
> > > > I think global oom already handles this in a much better way: invoke
> > > > the OOM killer, sleep for a second, then return to userspace to
> > > > relinquish all kernel resources and locks.  The only reason why we
> > > > can't simply change from an endless retry loop is because we don't
> > > > want to return VM_FAULT_OOM and invoke the global OOM killer.
> > > 
> > > Exactly.
> > > 
> > > > But maybe we can return a new VM_FAULT_OOM_HANDLED for memcg OOM and
> > > > just restart the pagefault.  Return -ENOMEM to the buffered IO syscall
> > > > respectively.  This way, the memcg OOM killer is invoked as it should
> > > > but nobody gets stuck anywhere livelocking with the exiting task.
> > > 
> > > Hmm, we would still have a problem with oom disabled (aka user space OOM
> > > killer), right? All processes but those in mem_cgroup_handle_oom are
> > > risky to be killed.
> > 
> > Could we still let everybody get stuck in there when the OOM killer is
> > disabled and let userspace take care of it?
> 
> I am not sure what exactly you mean by "userspace take care of it" but
> if those processes are stuck and holding the lock then it is usually
> hard to find that out. Well if somebody is familiar with internal then
> it is doable but this makes the interface really unusable for regular
> usage.

If oom_kill_disable is set, then all processes get stuck all the way
down in the charge stack.  Whatever resource they pin, you may
deadlock on if you try to touch it while handling the problem from
userspace.  I don't see how this is a new problem...?  Or do you mean
something else?

> > > Other POV might be, why we should trigger an OOM killer from those paths
> > > in the first place. Write or read (or even readahead) are all calls that
> > > should rather fail than cause an OOM killer in my opinion.
> > 
> > Readahead is arguable, but we kill globally for read() and write() and
> > I think we should do the same for memcg.
> 
> Fair point but the global case is little bit easier than memcg in this
> case because nobody can hook on OOM killer and provide a userspace
> implementation for it which is one of the cooler feature of memcg...
> I am all open to any suggestions but we should somehow fix this (and
> backport it to stable trees as this is there for quite some time. The
> current report shows that the problem is not that hard to trigger).

As per above, the userspace OOM handling is risky as hell anyway.
What happens when an anonymous fault waits in memcg userspace OOM
while holding the mmap_sem, and a writer lines up behind it?  Your
userspace OOM handler had better not look at any of the /proc files of
the stuck task that require the mmap_sem.

At the same token, it probably shouldn't touch the same files a memcg
task is stuck trying to read/write.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/