lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110318152532.GB18450@tiehlicka.suse.cz>
Date:	Fri, 18 Mar 2011 16:25:32 +0100
From:	Michal Hocko <mhocko@...e.cz>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	LKML <linux-kernel@...r.kernel.org>
Subject: cgroup: real meaning of memory.usage_in_bytes

Hi Kame,

I have received a report that our SLE11-SP1 (based on 2.6.32) kernel
doesn't pass LTP cgroup test case[*]. The test case basically creates a
cgroup (with 100M), runs a simple allocator which dirties a certain
amount of anonymous memory (under the limit) and finally checks whether
memory.usage_in_bytes == memory.stat (rss value).

This is obviously not 100% correct as the test should consider also
cache size but this test case doesn't end up using any cache pages so it
used worked when it was developed.

According to our documention this is a reasonable test case:
Documentation/cgroups/memory.txt:
memory.usage_in_bytes           # show current memory(RSS+Cache) usage.

This however doesn't work after your commit:
cdec2e4265d (memcg: coalesce charging via percpu storage)

because since then we are charging in bulks so we can end up with
rss+cache <= usage_in_bytes. Simple (attached) program will
show this as well:
# mkdir /dev/memctl; mount -t cgroup -omemory cgroup /dev/memctl; cd /dev/memctl
# mkdir group_1; cd group_1; echo 100M > memory.limit_in_bytes
# cat memory.{usage_in_bytes,stat} 
0
cache 0
rss 0
[...]

[run the program - it will print its pid and wait for enter]
echo pid > tasks

[hit enter to make the program mmap and dirty pages]
# cat memory.{usage_in_bytes,stat} 
131072
cache 0
rss 4096
[...]

[hit enter again to let it finish]
# cat memory.{usage_in_bytes,stat} 
126976
cache 0
rss 0
[...]

I think we have several options here
	1) document that the value is actually >= rss+cache and it shows
	   the guaranteed charges for the group
	2) use rss+cache rather then res->count
	3) remove the file
	4) call drain_all_stock_sync before asking for the value in
	   mem_cgroup_read
	5) collect the current amount of stock charges and subtract it
	   from the current res->count value

1) and 2) would suggest that the file is actually not very much useful.
3) is basically the interface change as well
4) sounds little bit invasive as we basically lose the advantage of the
pool whenever somebody reads the file. Btw. for who is this file
intended?
5) sounds like a compromise

As I do not see a point of the file I would like to get rid of it
completely rather than play games around it but I am not sure why we
have it in the first place.

What do you (and others) think? I have a patch for 4 ready here but I
would like to understand the purpose of the file more before I post it.

Thanks
--- 
[*] You can get source at http://sourceforge.net/projects/ltp/
./testcases/kernel/controllers/memctl/memctl_test01.c and
./testcases/kernel/controllers/memctl/run_memctl_test.sh

The test should be executed with 4 as the parameter
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ