linux-kernel - Re: [patch 2/2] memcg: do not sleep on OOM waitqueue with full charge context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.02.1306111454030.4803@chino.kir.corp.google.com>
Date:	Tue, 11 Jun 2013 14:57:08 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Johannes Weiner <hannes@...xchg.org>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Michal Hocko <mhocko@...e.cz>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-mm@...ck.org, cgroups@...r.kernel.org,
	linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 2/2] memcg: do not sleep on OOM waitqueue with full charge
 context

On Thu, 6 Jun 2013, Johannes Weiner wrote:

> > Could you point me to those bug reports?  As far as I know, we have never 
> > encountered them so it would be surprising to me that we're running with a 
> > potential landmine and have seemingly never hit it.
> 
> Sure thing: https://lkml.org/lkml/2012/11/21/497
> 

Ok, I think I read most of it, although the lkml.org interface makes it 
easy to miss some.

> During that thread Michal pinned down the problem to i_mutex being
> held by the OOM invoking task, which the selected victim is trying to
> acquire.
> 
> > > > > Reported-by: Reported-by: azurIt <azurit@...ox.sk>

Ok, so the key here is that azurIt was able to reliably reproduce this 
issue and now it has been resurrected after seven months of silence since 
that thread.  I also notice that azurIt isn't cc'd on this thread.  Do we 
know if this is still a problem?

We certainly haven't run into any memcg deadlocks like this.

> > It certainly would, but it's not the point that memory.oom_delay_millisecs 
> > was intended to address.  memory.oom_delay_millisecs would simply delay 
> > calling mem_cgroup_out_of_memory() unless userspace can't free memory or 
> > increase the memory limit in time.  Obviously that delay isn't going to 
> > magically address any lock dependency issues.
> 
> The delayed fallback would certainly resolve the issue of the
> userspace handler getting stuck, be it due to memory shortness or due
> to locks.
> 
> However, it would not solve the part of the problem where the OOM
> killing kernel task is holding locks that the victim requires to exit.
> 

Right.

> We are definitely looking at multiple related issues, that's why I'm
> trying to fix them step by step.
> 

I guess my question is why this would be addressed now when nobody has 
reported it recently on any recent kernel and then not cc the person who 
reported it?

Can anybody, even with an instrumented kernel to make it more probable, 
reproduce the issue this is addressing?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/