lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090421163537.GI19186@mit.edu>
Date:	Tue, 21 Apr 2009 12:35:37 -0400
From:	Theodore Tso <tytso@....edu>
To:	Andrea Righi <righi.andrea@...il.com>
Cc:	Jens Axboe <jens.axboe@...cle.com>,
	Paul Menage <menage@...gle.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Gui Jianfeng <guijianfeng@...fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	agk@...rceware.org, akpm@...ux-foundation.org,
	baramsori72@...il.com, Carl Henrik Lunde <chlunde@...g.uio.no>,
	dave@...ux.vnet.ibm.com, Divyesh Shah <dpshah@...gle.com>,
	eric.rannaud@...il.com, fernando@....ntt.co.jp,
	Hirokazu Takahashi <taka@...inux.co.jp>,
	Li Zefan <lizf@...fujitsu.com>, matt@...ehost.com,
	dradford@...ehost.com, ngupta@...gle.com, randy.dunlap@...cle.com,
	roberto@...it.it, Ryo Tsuruta <ryov@...inux.co.jp>,
	Satoshi UCHIDA <s-uchida@...jp.nec.com>,
	subrata@...ux.vnet.ibm.com, yoshikawa.takuya@....ntt.co.jp,
	containers@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 9/9] ext3: do not throttle metadata and journal IO

On Tue, Apr 21, 2009 at 04:31:31PM +0200, Andrea Righi wrote:
> 
> Some months ago I posted a proposal to account, track and limit per
> cgroup dirty pages in the memory cgroup subsystem:
> 
> https://lists.linux-foundation.org/pipermail/containers/2008-September/013140.html
> 
> At the moment I'm reworking on a similar and updated version. I know
> that Kamezawa is also implementing something to account per cgroup dirty
> pages in memory cgroup.
> 
> Moreover, io-throttle v14 already uses the page_cgroup structure to
> encode into page_cgroup->flags the cgroup ID (io-throttle css_id()
> actually) that originally dirtied the page.
> 
> This should be enough to track dirty pages and charge the right cgroup.

I'm not convinced this will work that well.  Right now the associating
a page with a cgroup is done on a very rough basis --- basically,
whoever touches a page last "owns" the page.  That means if one
process first tries reading from the cgroup, it will "own" the page.
This can get quite arbitrary for shared libraries, for example.
However, while it may be the best that you can do for RSS accounting,
it gets worse for tracking dirties pages.

Now if you have processes from one cgroup that always reading from
some data file, and a process from another cgroup which is updating
said data file, the writes won't be charged to the correct cgroup.

So using the same data structures to assign page ownership for RSS
accounting and page dirtying accounting might not be such a great
idea.  On the other hand, using a completely different set of data
structures increases your overhead.

That being said, it's not obvious to me that trying to track RSS
ownership on a per-page basis makes sense.  It may not be worth the
overhead, particularly on a machine with a truly large amount of
memory.  So for example, tracking on a per vm_area_struct, and
splitting the cost across cgroups, might be a better way of tracking
RSS accounting.  But for dirty pages, where there will be much fewer
such pages, maybe using a per-page scheme makes more sense.  The
take-home here is that using different mechanisms for tracking RSS
accounting and dirty page accounting on a per-cgroup basis, with the
understanding that this will all be horribly rough and non-exact, may
make a lot of sense.

Best,

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ