linux-kernel - Re: [patch 2/2] memcg: do not sleep on OOM waitqueue with full charge context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130613134826.GE23070@dhcp22.suse.cz>
Date:	Thu, 13 Jun 2013 15:48:26 +0200
From:	Michal Hocko <mhocko@...e.cz>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Johannes Weiner <hannes@...xchg.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-mm@...ck.org, cgroups@...r.kernel.org,
	linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 2/2] memcg: do not sleep on OOM waitqueue with full
 charge context

On Wed 12-06-13 13:49:47, David Rientjes wrote:
> On Wed, 12 Jun 2013, Michal Hocko wrote:
> 
> > The patch is a big improvement with a minimum code overhead. Blocking
> > any task which sits on top of an unpredictable amount of locks is just
> > broken. So regardless how many users are affected we should merge it and
> > backport to stable trees. The problem is there since ever. We seem to
> > be surprisingly lucky to not hit this more often.
> > 
> 
> Right now it appears that that number of users is 0 and we're talking 
> about a problem that was reported in 3.2 that was released a year and a 
> half ago.  The rules of inclusion in stable also prohibit such a change 
> from being backported, specifically "It must fix a real bug that bothers 
> people (not a, "This could be a problem..." type thing)".

As you can see there is an user seeing this in 3.2. The bug is _real_ and
I do not see what you are objecting against. Do you really think that
sitting on a time bomb is preferred more?

> We have deployed memcg on a very large number of machines and I can run a 
> query over all software watchdog timeouts that have occurred by 
> deadlocking on i_mutex during memcg oom.  It returns 0 results.

Do you capture /prc/<pid>/stack for each of them to find that your
deadlock (and you have reported that they happen) was in fact caused by
a locking issue? These kind of deadlocks might got unnoticed especially
when the oom is handled by userspace by increasing the limit (my mmecg
is stuck and increasing the limit a bit always helped).

> > I am not quite sure I understand your reservation about the patch to be
> > honest. Andrew still hasn't merged this one although 1/2 is in.
> 
> Perhaps he is as unconvinced?  The patch adds 100 lines of code, including 
> fields to task_struct for memcg, for a problem that nobody can reproduce.  
> My question still stands: can anybody, even with an instrumented kernel to 
> make it more probable, reproduce the issue this is addressing?

So the referenced discussion is not sufficient?

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/