[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090422033032.GR19637@balbir.in.ibm.com>
Date: Wed, 22 Apr 2009 09:00:32 +0530
From: Balbir Singh <balbir@...ux.vnet.ibm.com>
To: Theodore Tso <tytso@....edu>,
Andrea Righi <righi.andrea@...il.com>,
Jens Axboe <jens.axboe@...cle.com>,
Paul Menage <menage@...gle.com>,
Gui Jianfeng <guijianfeng@...fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
agk@...rceware.org, akpm@...ux-foundation.org,
baramsori72@...il.com, Carl Henrik Lunde <chlunde@...g.uio.no>,
dave@...ux.vnet.ibm.com, Divyesh Shah <dpshah@...gle.com>,
eric.rannaud@...il.com, fernando@....ntt.co.jp,
Hirokazu Takahashi <taka@...inux.co.jp>,
Li Zefan <lizf@...fujitsu.com>, matt@...ehost.com,
dradford@...ehost.com, ngupta@...gle.com, randy.dunlap@...cle.com,
roberto@...it.it, Ryo Tsuruta <ryov@...inux.co.jp>,
Satoshi UCHIDA <s-uchida@...jp.nec.com>,
subrata@...ux.vnet.ibm.com, yoshikawa.takuya@....ntt.co.jp,
containers@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 9/9] ext3: do not throttle metadata and journal IO
* Theodore Tso <tytso@....edu> [2009-04-21 15:14:01]:
> On Tue, Apr 21, 2009 at 11:44:29PM +0530, Balbir Singh wrote:
> >
> > That would be true in general, but only the process writing to the
> > file will dirty it. So dirty already accounts for the read/write
> > split. I'd assume that the cost is only for the dirty page, since we
> > do IO only on write in this case, unless I am missing something very
> > obvious.
>
> Maybe I'm missing something, but the (in development) patches I saw
> seemed to use the existing infrastructure designed for RSS cost
> tracking (which is also not yet in mainline, unless I'm mistaken ---
> but I didn't see page_get_page_cgroup() in the mainline tree yet).
>
> Right? So if process A in cgroup A reads touches the file first by
> reading from it, then the pages read by process A will be assigned as
> being "owned" by cgroup A. Then when the patch described at
>
> http://lkml.org/lkml/2008/9/9/245
That is correct, but on reclaim (hitting the limit) a page that is frequently
used by B and not A, can get reclaimed from A and move to B if B is
heavily using it.
>
> ... tries to charge a write done by process B in cgroup B, the code
> will call page_get_page_cgroup(), see that it is "owned" by cgroup A,
> and charge the dirty page to cgroup A. If process A and all of the
> other processes in cgroup A only access this file read-only, and
> process B is updating this file very heavily --- and it is a large
> file --- then cgroup B will get a completely free pass as far as
> dirtying pages to this file, since it will be all charged 100% to
> cgroup A, incorrectly.
>
> So what am I missing?
You are right. As long as A is not exceeding its limit, B will get a
free pass at the page. The page will be inactive on A's LRU and active
on the global LRU though from the memory controller perspective. We'll
need to find a way to fix this, if this is a very common scenario for
the IO controller.
--
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists