linux-kernel - Re: [PATCH for 3.2.34] memcg: do not trigger OOM if PF_NO_MEMCG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20130208120249.FD733220@pobox.sk>
Date:	Fri, 08 Feb 2013 12:02:49 +0100
From:	"azurIt" <azurit@...ox.sk>
To:	Michal Hocko <mhocko@...e.cz>
Cc:	<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
	cgroups mailinglist <cgroups@...r.kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Johannes Weiner <hannes@...xchg.org>
Subject: Re: [PATCH for 3.2.34] memcg: do not trigger OOM if PF_NO_MEMCG_OOM is set

>
>Do you have logs from that time period?
>
>I have only glanced through the stacks and most of the threads are
>waiting in the mem_cgroup_handle_oom (mostly from the page fault path
>where we do not have other options than waiting) which suggests that
>your memory limit is seriously underestimated. If you look at the number
>of charging failures (memory.failcnt per-group file) then you will get
>9332083 failures in _average_ per group. This is a lot!
>Not all those failures end with OOM, of course. But it clearly signals
>that the workload need much more memory than the limit allows.


What type of logs? I have all.

Memory usage graph:
http://www.watchdog.sk/lkml/memory2.png

New kernel was booted about 1:15. Data in memcg-bug-4.tar.gz were taken about 2:35 and data in memcg-bug-5.tar.gz about 5:25. There was always lots of free memory. Higher memory consumption between 3:39 and 5:33 was caused by data backup and was completed few minutes before i restarted the server (this was just a coincidence).



>There are only 5 groups in this one and all of them have no memory
>charged (so no OOM going on). All tasks are somewhere in the ptrace
>code.


It's all from the same cgroup but from different time.



>grep cache -r .
>./1360297489/memory.stat:cache 0
>./1360297489/memory.stat:total_cache 65642496
>./1360297491/memory.stat:cache 0
>./1360297491/memory.stat:total_cache 65642496
>./1360297492/memory.stat:cache 0
>./1360297492/memory.stat:total_cache 65642496
>./1360297490/memory.stat:cache 0
>./1360297490/memory.stat:total_cache 65642496
>./1360297488/memory.stat:cache 0
>./1360297488/memory.stat:total_cache 65642496
>
>which suggests that this is a parent group and the memory is charged in
>a child group. I guess that all those are under OOM as the number seems
>like they have limit at 62M.


The cgroup has limit 330M (346030080 bytes). As i said, these two processes were stucked and was impossible to kill them. They were, maybe, the processes which i was trying to 'strace' before - 'strace' was freezed as always when the cgroup has this problem and i killed it (i was just trying if it is the original cgroup problem).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/