linux-kernel - Re: [merged] mm-memcg-handle-non-error-oom-situations-more-gracefully.patch removed from -mm tree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131128031313.GK3556@cmpxchg.org>
Date:	Wed, 27 Nov 2013 22:13:13 -0500
From:	Johannes Weiner <hannes@...xchg.org>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>, stable@...nel.org,
	Michal Hocko <mhocko@...e.cz>, azurit@...ox.sk,
	mm-commits@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [merged]
 mm-memcg-handle-non-error-oom-situations-more-gracefully.patch removed from
 -mm tree

On Wed, Nov 27, 2013 at 06:38:31PM -0800, David Rientjes wrote:
> On Wed, 27 Nov 2013, Johannes Weiner wrote:
> 
> > > The task that is bypassing the memcg charge to the root memcg may not be 
> > > the process that is chosen by the oom killer, and it's possible the amount 
> > > of memory freed by killing the victim is less than the amount of memory 
> > > bypassed.
> > 
> > That's true, though unlikely.
> > 
> 
> Well, the "goto bypass" allows it and it's trivial to cause by 
> manipulating /proc/pid/oom_score_adj values to prefer processes with very 
> little rss.  It will just continue looping and killing processes as they 
> are forked and never cause the memcg to free memory below its limit.  At 
> least the "goto nomem" allows us to free some memory instead of leaking to 
> the root memcg.

Yes, that's the better way of doing it, I'll send the patch.  Thanks.

> > > Were you targeting these to 3.13 instead?  If so, it would have already 
> > > appeared in 3.13-rc1 anyway.  Is it still a work in progress?
> > 
> > I don't know how to answer this question.
> > 
> 
> It appears as though this work is being developed in Linus's tree rather 
> than -mm, so I'm asking if we should consider backing some of it out for 
> 3.14 instead.

The changes fix a deadlock problem.  Are they creating problems that
are worse than deadlocks, that would justify their revert?

> > > Should we be checking mem_cgroup_margin() here to ensure 
> > > task_in_memcg_oom() is still accurate and we haven't raced by freeing 
> > > memory?
> > 
> > We would have invoked the OOM killer long before this point prior to
> > my patches.  There is a line we draw and from that point on we start
> > killing things.  I tried to explain multiple times now that there is
> > no race-free OOM killing and I'm tired of it.  Convince me otherwise
> > or stop repeating this non-sense.
> > 
> 
> In our internal kernel we call mem_cgroup_margin() with the order of the 
> charge immediately prior to sending the SIGKILL to see if it's still 
> needed even after selecting the victim.  It makes the race smaller.
> 
> It's obvious that after the SIGKILL is sent, either from the kernel or 
> from userspace, that memory might subsequently be freed or another process 
> might exit before the process killed could even wake up.  There's nothing 
> we can do about that since we don't have psychic abilities.  I think we 
> should try to reduce the chance for unnecessary oom killing as much as 
> possible, however.

Since we can't physically draw a perfect line, we should strive for a
reasonable and intuitive line.  After that it's rapidly diminishing
returns.  Killing something after that much reclaim effort without
success is a completely reasonable and intuitive line to draw.  It's
also the line that has been drawn a long time ago and we're not
breaking this because of a micro optmimization.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/