lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 19 May 2014 16:02:48 +0200
From:	Michal Hocko <mhocko@...e.cz>
To:	Greg Thelen <gthelen@...gle.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Tejun Heo <tj@...nel.org>, Hugh Dickins <hughd@...gle.com>,
	LKML <linux-kernel@...r.kernel.org>, linux-mm@...ck.org
Subject: Re: [PATCH] memcg: deprecate memory.force_empty knob

On Fri 16-05-14 15:00:16, Greg Thelen wrote:
> On Tue, May 13 2014, Michal Hocko <mhocko@...e.cz> wrote:
[...]
> > If somebody really cares because reparented pages, which would be
> > dropped otherwise, push out more important ones then we should fix the
> > reparenting code and put pages to the tail.
> 
> I should mention a case where I've needed to use memory.force_empty: to
> synchronously flush stats from child to parent.  Without force_empty
> memory.stat is temporarily inconsistent until async css_offline
> reparents charges.  Here is an example on v3.14 showing that
> parent/memory.stat contents are in-flux immediately after rmdir of
> parent/child.

OK, it is true that the delayed offlining makes this little bit
complicated because there is no direct user visible relation between
rmdir and css_offline.

> $ cat /test
> #!/bin/bash
> 
> # Create parent and child.  Add some non-reclaimable anon rss to child,
> # then move running task to parent.
> mkdir p p/c
> (echo $BASHPID > p/c/cgroup.procs && exec sleep 1d) &
> pid=$!
> sleep 1
> echo $pid > p/cgroup.procs 
> 
> grep 'rss ' {p,p/c}/memory.stat
> if [[ $1 == force ]]; then
>   echo 1 > p/c/memory.force_empty
> fi
> rmdir p/c
> 
> echo 'For a small time the p/c memory has not been reparented to p.'
> grep 'rss ' {p,p/c}/memory.stat
> 
> sleep 1
> echo 'After waiting all memory has been reparented'
> grep 'rss ' {p,p/c}/memory.stat
> 
> kill $pid
> rmdir p
> 
> 
> -- First, demonstrate that just rmdir, without memory.force_empty,
>    temporarily hides reparented child memory stats.
> 
> $ /test
> p/memory.stat:rss 0
> p/memory.stat:total_rss 69632
> p/c/memory.stat:rss 69632
> p/c/memory.stat:total_rss 69632
> For a small time the p/c memory has not been reparented to p.
> p/memory.stat:rss 0
> p/memory.stat:total_rss 0

OK, this is a bug. Our iterators skip the children because css_tryget
fails on it but css_offline still not done. This is fixable, though,
and force_empty is just a workaround so I wouldn't see this as a proper
justification to keep it alive.

One possible way to fix this is to iterate children even when css_tryget
fails for them if they haven't finished css_offline yet.
There are some changes in the cgroups core which should make this easier
and Johannes claimed he has some work in that area.

Anyway this is a useful testcase. Thanks Greg!

> grep: p/c/memory.stat: No such file or directory
> After waiting all memory has been reparented
> p/memory.stat:rss 69632
> p/memory.stat:total_rss 69632
> grep: p/c/memory.stat: No such file or directory
> /test: Terminated              ( echo $BASHPID > p/c/cgroup.procs && exec sleep 1d )
> 
> -- Demonstrate that using memory.force_empty before rmdir, behaves more
>    sensibly.  Stats for reparented child memory are not hidden.
> 
> $ /test force
> p/memory.stat:rss 0
> p/memory.stat:total_rss 69632
> p/c/memory.stat:rss 69632
> p/c/memory.stat:total_rss 69632
> For a small time the p/c memory has not been reparented to p.
> p/memory.stat:rss 69632
> p/memory.stat:total_rss 69632
> grep: p/c/memory.stat: No such file or directory
> After waiting all memory has been reparented
> p/memory.stat:rss 69632
> p/memory.stat:total_rss 69632
> grep: p/c/memory.stat: No such file or directory
> /test: Terminated              ( echo $BASHPID > p/c/cgroup.procs && exec sleep 1d )

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ