[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20090901.160004.226800357.ryov@valinux.co.jp>
Date: Tue, 01 Sep 2009 16:00:04 +0900 (JST)
From: Ryo Tsuruta <ryov@...inux.co.jp>
To: nauman@...gle.com
Cc: vgoyal@...hat.com, riel@...hat.com, linux-kernel@...r.kernel.org,
jens.axboe@...cle.com, containers@...ts.linux-foundation.org,
dm-devel@...hat.com, dpshah@...gle.com, lizf@...fujitsu.com,
mikew@...gle.com, fchecconi@...il.com, paolo.valente@...more.it,
fernando@....ntt.co.jp, s-uchida@...jp.nec.com, taka@...inux.co.jp,
guijianfeng@...fujitsu.com, jmoyer@...hat.com,
dhaval@...ux.vnet.ibm.com, balbir@...ux.vnet.ibm.com,
righi.andrea@...il.com, m-ikeda@...jp.nec.com, agk@...hat.com,
akpm@...ux-foundation.org, peterz@...radead.org,
jmarchan@...hat.com, torvalds@...ux-foundation.org, mingo@...e.hu
Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to
track async bios.
Hi,
> > Hi Rik,
> >
> > Thanks for reviewing the patches. I wanted to have better understanding of
> > where all does it help to associate a bio to the group of process who
> > created/owned the page. Hence few thoughts.
> >
> > When a bio is submitted to IO scheduler, it needs to determine the group
> > bio belongs to and group which should be charged to. There seem to be two
> > methods.
> >
> > - Attribute the bio to cgroup submitting process belongs to.
> > - For async requests, track the original owner hence cgroup of the page
> > and charge that group for the bio.
> >
> > One can think of pros/cons of both the approaches.
> >
> > - The primary use case of tracking async context seems be that if a
> > process T1 in group G1 mmaps a big file and then another process T2 in
> > group G2, asks for memory and triggers reclaim and generates writes of
> > the file pages mapped by T1, then these writes should not be charged to
> > T2, hence blkio_cgroup pages.
> >
> > But the flip side of this might be that group G2 is a low weight group
> > and probably too busy also right now, which will delay the write out
> > and possibly T2 will wait longer for memory to be allocated.
In order to avoid this wait, dm-ioband issues IO which has a page with
PG_Reclaim as early as possible.
> > - At one point of time Andrew mentioned that buffered writes are generally a
> > big problem and one needs to map these to owner's group. Though I am not
> > very sure what specific problem he was referring to. Can we attribute
> > buffered writes to pdflush threads and move all pdflush threads in a
> > cgroup to limit system wide write out activity?
I think that buffered writes also should be controlled per cgroup as
well as synchronous writes.
> > - Somebody also gave an example where there is a memory hogging process and
> > possibly pushes out some processes to swap. It does not sound fair to
> > charge those proccess for that swap writeout. These processes never
> > requested swap IO.
I think that swap writeouts should be charged to the memory hogging
process, because the process consumes more resources and it should get
a penalty.
> > - If there are multiple buffered writers in the system, then those writers
> > can also be forced to writeout some pages to disk before they are
> > allowed to dirty more pages. As per the page cache design, any writer
> > can pick any inode and start writing out pages. So it can happen a
> > weight group task is writting out pages dirtied by a lower weight group
> > task. If, async bio is mapped to owner's group, it might happen that
> > higher weight group task might be made to sleep on lower weight group
> > task because request descriptors are all consumed up.
As mentioned above, in dm-ioband, the bio is charged to the page owner
and issued immediately.
> > It looks like there does not seem to be a clean way which covers all the
> > cases without issues. I am just trying to think, what is a simple way
> > which covers most of the cases. Can we just stick to using submitting task
> > context to determine a bio's group (as cfq does). Which can result in
> > following.
> >
> > - Less code and reduced complexity.
> >
> > - Buffered writes will be charged to pdflush and its group. If one wish to
> > limit buffered write activity for pdflush, one can move all the pdflush
> > threads into a group and assign desired weight. Writes submitted in
> > process context will continue to be charged to that process irrespective
> > of the fact who dirtied that page.
>
> What if we wanted to control buffered write activity per group? If a
> group keeps dirtying pages, we wouldn't want it to dominate the disk
> IO capacity at the expense of other cgroups (by dominating the writes
> sent down by pdflush).
Yes, I think that is true.
> > - swap activity will be charged to kswapd and its group. If swap writes
> > are coming from process context, it gets charged to process and its
> > group.
> >
> > - If one is worried about the case of one process being charged for write
> > out of file mapped by another process during reclaim, then we can
> > probably make use of memory controller and mount memory controller and
> > io controller together on same hierarchy. I am told that with memory
> > controller, group's memory will be reclaimed by the process requesting
> > more memory. If that's the case, then IO will automatically be charged
> > to right group if we use submitting task context.
> >
> > I just wanted to bring this point forward for more discussions to know
> > what is the right thing to do? Use bio tracking or not.
Thanks for bringing it forward.
Thanks,
Ryo Tsuruta
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists