linux-kernel - Re: [PATCH for 3.2] memcg: do not trap chargers with full callstack on OOM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130705191854.GR17812@cmpxchg.org>
Date:	Fri, 5 Jul 2013 15:18:54 -0400
From:	Johannes Weiner <hannes@...xchg.org>
To:	azurIt <azurit@...ox.sk>
Cc:	Michal Hocko <mhocko@...e.cz>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, cgroups mailinglist <cgroups@...r.kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [PATCH for 3.2] memcg: do not trap chargers with full callstack
 on OOM

On Fri, Jul 05, 2013 at 09:02:46PM +0200, azurIt wrote:
> >I looked at your debug messages but could not find anything that would
> >hint at a deadlock.  All tasks are stuck in the refrigerator, so I
> >assume you use the freezer cgroup and enabled it somehow?
> 
> 
> Yes, i'm really using freezer cgroup BUT i was checking if it's not
> doing problems - unfortunately, several days passed from that day
> and now i don't fully remember if i was checking it for both cases
> (unremoveabled cgroups and these freezed processes holding web
> server port). I'm 100% sure i was checking it for unremoveable
> cgroups but not so sure for the other problem (i had to act quickly
> in that case). Are you sure (from stacks) that freezer cgroup was
> enabled there?

Yeah, all the traces without exception look like this:

1372089762/23433/stack:[<ffffffff81080925>] refrigerator+0x95/0x160
1372089762/23433/stack:[<ffffffff8106ab7b>] get_signal_to_deliver+0x1cb/0x540
1372089762/23433/stack:[<ffffffff8100188b>] do_signal+0x6b/0x750
1372089762/23433/stack:[<ffffffff81001fc5>] do_notify_resume+0x55/0x80
1372089762/23433/stack:[<ffffffff815cac77>] int_signal+0x12/0x17
1372089762/23433/stack:[<ffffffffffffffff>] 0xffffffffffffffff

so the freezer was already enabled when you took the backtraces.

> Btw, what about that other stacks? I mean this file:
> http://watchdog.sk/lkml/memcg-bug-7.tar.gz
> 
> It was taken while running the kernel with your patch and from
> cgroup which was under unresolveable OOM (just like my very original
> problem).

I looked at these traces too, but none of the tasks are stuck in rmdir
or the OOM path.  Some /are/ in the page fault path, but they are
happily doing reclaim and don't appear to be stuck.  So I'm having a
hard time matching this data to what you otherwise observed.

However, based on what you reported the most likely explanation for
the continued hangs is the unfinished OOM handling for which I sent
the followup patch for arch/x86/mm/fault.c.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/