linux-kernel - Re: [patch 0/7] improve memcg oom killer robustness v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130905095331.GA9702@dhcp22.suse.cz>
Date:	Thu, 5 Sep 2013 11:53:31 +0200
From:	Michal Hocko <mhocko@...e.cz>
To:	azurIt <azurit@...ox.sk>
Cc:	Johannes Weiner <hannes@...xchg.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	linux-mm@...ck.org, cgroups@...r.kernel.org, x86@...nel.org,
	linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 0/7] improve memcg oom killer robustness v2

On Thu 05-09-13 11:14:30, azurIt wrote:
[...]
> My script detected another freezed cgroup today, sending stacks. Is
> there anything interesting?

3 tasks are sleeping and waiting for somebody to take an action to
resolve memcg OOM. The memcg oom killer is enabled for that group?  If
yes, which task has been selected to be killed? You can find that in oom
report in dmesg.

I can see a way how this might happen. If the killed task happened to
allocate a memory while it is exiting then it would get to the oom
condition again without freeing any memory so nobody waiting on the
memcg_oom_waitq gets woken. We have a report like that: 
https://lkml.org/lkml/2013/7/31/94

The issue got silent in the meantime so it is time to wake it up.
It would be definitely good to see what happened in your case though.
If any of the bellow tasks was the oom victim then it is very probable
this is the same issue.

> pid: 1031
[...]
> stack:
> [<ffffffff8110f255>] mem_cgroup_oom_synchronize+0x165/0x190
> [<ffffffff810d269e>] pagefault_out_of_memory+0xe/0x120
> [<ffffffff81026f5e>] mm_fault_error+0x9e/0x150
> [<ffffffff81027414>] do_page_fault+0x404/0x490
> [<ffffffff815cb7bf>] page_fault+0x1f/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
[...]
> pid: 1036
> stack:
> [<ffffffff8110f255>] mem_cgroup_oom_synchronize+0x165/0x190
> [<ffffffff810d269e>] pagefault_out_of_memory+0xe/0x120
> [<ffffffff81026f5e>] mm_fault_error+0x9e/0x150
> [<ffffffff81027414>] do_page_fault+0x404/0x490
> [<ffffffff815cb7bf>] page_fault+0x1f/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> pid: 1038
> stack:
> [<ffffffff8110f255>] mem_cgroup_oom_synchronize+0x165/0x190
> [<ffffffff810d269e>] pagefault_out_of_memory+0xe/0x120
> [<ffffffff81026f5e>] mm_fault_error+0x9e/0x150
> [<ffffffff81027414>] do_page_fault+0x404/0x490
> [<ffffffff815cb7bf>] page_fault+0x1f/0x30
> [<ffffffffffffffff>] 0xffffffffffffffff

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/