linux-kernel - Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to track async bios.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e98e18940908311651s26de5b70ye6f4a82402956309@mail.gmail.com>
Date:	Mon, 31 Aug 2009 16:51:25 -0700
From:	Nauman Rafique <nauman@...gle.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Rik van Riel <riel@...hat.com>, linux-kernel@...r.kernel.org,
	jens.axboe@...cle.com, containers@...ts.linux-foundation.org,
	dm-devel@...hat.com, dpshah@...gle.com, lizf@...fujitsu.com,
	mikew@...gle.com, fchecconi@...il.com, paolo.valente@...more.it,
	ryov@...inux.co.jp, fernando@....ntt.co.jp, s-uchida@...jp.nec.com,
	taka@...inux.co.jp, guijianfeng@...fujitsu.com, jmoyer@...hat.com,
	dhaval@...ux.vnet.ibm.com, balbir@...ux.vnet.ibm.com,
	righi.andrea@...il.com, m-ikeda@...jp.nec.com, agk@...hat.com,
	akpm@...ux-foundation.org, peterz@...radead.org,
	jmarchan@...hat.com, torvalds@...ux-foundation.org, mingo@...e.hu
Subject: Re: [PATCH 18/23] io-controller: blkio_cgroup patches from Ryo to 
	track async bios.

On Mon, Aug 31, 2009 at 11:56 AM, Vivek Goyal<vgoyal@...hat.com> wrote:
> On Mon, Aug 31, 2009 at 01:34:54PM -0400, Rik van Riel wrote:
>> Vivek Goyal wrote:
>>> o blkio_cgroup patches from Ryo to track async bios.
>>>
>>> o This functionality is used to determine the group of async IO from page
>>>   instead of context of submitting task.
>>>
>>> Signed-off-by: Hirokazu Takahashi <taka@...inux.co.jp>
>>> Signed-off-by: Ryo Tsuruta <ryov@...inux.co.jp>
>>> Signed-off-by: Vivek Goyal <vgoyal@...hat.com>
>>
>> This seems to be the most complex part of the code so far,
>> but I see why this code is necessary.
>>
>
> Hi Rik,
>
> Thanks for reviewing the patches. I wanted to have better understanding of
> where all does it help to associate a bio to the group of process who
> created/owned the page. Hence few thoughts.
>
> When a bio is submitted to IO scheduler, it needs to determine the group
> bio belongs to and group which should be charged to. There seem to be two
> methods.
>
> - Attribute the bio to cgroup submitting process belongs to.
> - For async requests, track the original owner hence cgroup of the page
>  and charge that group for the bio.
>
> One can think of pros/cons of both the approaches.
>
> - The primary use case of tracking async context seems be that if a
>  process T1 in group G1 mmaps a big file and then another process T2 in
>  group G2, asks for memory and triggers reclaim and generates writes of
>  the file pages mapped by T1, then these writes should not be charged to
>  T2, hence blkio_cgroup pages.
>
>  But the flip side of this might be that group G2 is a low weight group
>  and probably too busy also right now, which will delay the write out
>  and possibly T2 will wait longer for memory to be allocated.
>
> - At one point of time Andrew mentioned that buffered writes are generally a
>  big problem and one needs to map these to owner's group. Though I am not
>  very sure what specific problem he was referring to. Can we attribute
>  buffered writes to pdflush threads and move all pdflush threads in a
>  cgroup to limit system wide write out activity?
>
> - Somebody also gave an example where there is a memory hogging process and
>  possibly pushes out some processes to swap. It does not sound fair to
>  charge those proccess for that swap writeout. These processes never
>  requested swap IO.
>
> - If there are multiple buffered writers in the system, then those writers
>  can also be forced to writeout some pages to disk before they are
>  allowed to dirty more pages. As per the page cache design, any writer
>  can pick any inode and start writing out pages. So it can happen a
>  weight group task is writting out pages dirtied by a lower weight group
>  task. If, async bio is mapped to owner's group, it might happen that
>  higher weight group task might be made to sleep on lower weight group
>  task because request descriptors are all consumed up.
>
> It looks like there does not seem to be a clean way which covers all the
> cases without issues. I am just trying to think, what is a simple way
> which covers most of the cases. Can we just stick to using submitting task
> context to determine a bio's group (as cfq does). Which can result in
> following.
>
> - Less code and reduced complexity.
>
> - Buffered writes will be charged to pdflush and its group. If one wish to
>  limit buffered write activity for pdflush, one can move all the pdflush
>  threads into a group and assign desired weight. Writes submitted in
>  process context will continue to be charged to that process irrespective
>  of the fact who dirtied that page.

What if we wanted to control buffered write activity per group? If a
group keeps dirtying pages, we wouldn't want it to dominate the disk
IO capacity at the expense of other cgroups (by dominating the writes
sent down by pdflush).

>
> - swap activity will be charged to kswapd and its group. If swap writes
>  are coming from process context, it gets charged to process and its
>  group.
>
> - If one is worried about the case of one process being charged for write
>  out of file mapped by another process during reclaim, then we can
>  probably make use of memory controller and mount memory controller and
>  io controller together on same hierarchy. I am told that with memory
>  controller, group's memory will be reclaimed by the process requesting
>  more memory. If that's the case, then IO will automatically be charged
>  to right group if we use submitting task context.
>
> I just wanted to bring this point forward for more discussions to know
> what is the right thing to do? Use bio tracking or not.
>
> Ryo, any thoughts on this?
>
> Thanks
> Vivek
>
>> Acked-by: Rik van Riel <riel@...hat.com>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/