linux-kernel - Re: [patch 0/7] improve memcg oom killer robustness v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20130916122259.F60857D4@pobox.sk>
Date:	Mon, 16 Sep 2013 12:22:59 +0200
From:	"azurIt" <azurit@...ox.sk>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Michal Hocko <mhocko@...e.cz>,
	David Rientjes <rientjes@...gle.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	<linux-mm@...ck.org>, <cgroups@...r.kernel.org>, <x86@...nel.org>,
	<linux-arch@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [patch 0/7] improve memcg oom killer robustness v2

> CC: "Andrew Morton" <akpm@...ux-foundation.org>, "Michal Hocko" <mhocko@...e.cz>, "David Rientjes" <rientjes@...gle.com>, "KAMEZAWA Hiroyuki" <kamezawa.hiroyu@...fujitsu.com>, "KOSAKI Motohiro" <kosaki.motohiro@...fujitsu.com>, linux-mm@...ck.org, cgroups@...r.kernel.org, x86@...nel.org, linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org
>On Wed, Sep 11, 2013 at 09:41:18PM +0200, azurIt wrote:
>> >On Wed, Sep 11, 2013 at 08:54:48PM +0200, azurIt wrote:
>> >> >On Wed, Sep 11, 2013 at 02:33:05PM +0200, azurIt wrote:
>> >> >> >On Tue, Sep 10, 2013 at 11:32:47PM +0200, azurIt wrote:
>> >> >> >> >On Tue, Sep 10, 2013 at 11:08:53PM +0200, azurIt wrote:
>> >> >> >> >> >On Tue, Sep 10, 2013 at 09:32:53PM +0200, azurIt wrote:
>> >> >> >> >> >> Here is full kernel log between 6:00 and 7:59:
>> >> >> >> >> >> http://watchdog.sk/lkml/kern6.log
>> >> >> >> >> >
>> >> >> >> >> >Wow, your apaches are like the hydra.  Whenever one is OOM killed,
>> >> >> >> >> >more show up!
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> 
>> >> >> >> >> Yeah, it's supposed to do this ;)
>> >> >> >
>> >> >> >How are you expecting the machine to recover from an OOM situation,
>> >> >> >though?  I guess I don't really understand what these machines are
>> >> >> >doing.  But if you are overloading them like crazy, isn't that the
>> >> >> >expected outcome?
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> 
>> >> >> There's no global OOM, server has enough of memory. OOM is occuring only in cgroups (customers who simply don't want to pay for more memory).
>> >> >
>> >> >Yes, sure, but when the cgroups are thrashing, they use the disk and
>> >> >CPU to the point where the overall system is affected.
>> >> 
>> >> 
>> >> 
>> >> 
>> >> Didn't know that there is a disk usage because of this, i never noticed anything yet.
>> >
>> >You said there was heavy IO going on...?
>> 
>> 
>> 
>> Yes, there usually was a big IO but it was related to that
>> deadlocking bug in kernel (or i assume it was). I never saw a big IO
>> in normal conditions even when there were lots of OOM in
>> cgroups. I'm even not using swap because of this so i was assuming
>> that lacks of memory is not doing any additional IO (or am i
>> wrong?). And if you mean that last problem with IO from Monday, i
>> don't exactly know what happens but it's really long time when we
>> had so big problem with IO that it disables also root login on
>> console.
>
>The deadlocking problem should be separate from this.
>
>Even without swap, the binaries and libraries of the running tasks can
>get reclaimed (and immediately faulted back from disk, i.e thrashing).
>
>Usually the OOM killer should kick in before tasks cannibalize each
>other like that.
>
>The patch you were using did in fact have the side effect of widening
>the window between tasks entering heavy reclaim and the OOM killer
>kicking in, so it could explain the IO worsening while fixing the dead
>lock problem.
>
>That followup patch tries to narrow this window by quite a bit and
>tries to stop concurrent reclaim when the group is already OOM.
>

Johannes,

it's, unfortunately, happening several times per day and we cannot work like this :( i will boot previous kernel this night. If you have any patches which can help me or you, please send them so i can install them with this reboot. Thank you.

azur
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/