linux-kernel - Re: [PATCH 9/9] ext3: do not throttle metadata and journal IO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20090422093349.1ee9ae82.kamezawa.hiroyu@jp.fujitsu.com>
Date:	Wed, 22 Apr 2009 09:33:49 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Andrea Righi <righi.andrea@...il.com>
Cc:	Theodore Tso <tytso@....edu>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Paul Menage <menage@...gle.com>,
	Gui Jianfeng <guijianfeng@...fujitsu.com>, agk@...rceware.org,
	akpm@...ux-foundation.org, baramsori72@...il.com,
	Carl Henrik Lunde <chlunde@...g.uio.no>,
	dave@...ux.vnet.ibm.com, Divyesh Shah <dpshah@...gle.com>,
	eric.rannaud@...il.com, fernando@....ntt.co.jp,
	Hirokazu Takahashi <taka@...inux.co.jp>,
	Li Zefan <lizf@...fujitsu.com>, matt@...ehost.com,
	dradford@...ehost.com, ngupta@...gle.com, randy.dunlap@...cle.com,
	roberto@...it.it, Ryo Tsuruta <ryov@...inux.co.jp>,
	Satoshi UCHIDA <s-uchida@...jp.nec.com>,
	subrata@...ux.vnet.ibm.com, yoshikawa.takuya@....ntt.co.jp,
	containers@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 9/9] ext3: do not throttle metadata and journal IO

On Tue, 21 Apr 2009 22:49:06 +0200
Andrea Righi <righi.andrea@...il.com> wrote:
> yep! right. Anyway, it's not completely wrong to account dirty pages in
> this way. The dirty pages actually belong to cgroup A and providing per
> cgroup upper limits of dirty pages could help to equally distribute
> dirty pages, that are hard/slow to reclaim, among cgroups.
> 
> But this is definitely another problem.
> 
Hmm, my motivation for dirty accounting in memcg is for supporting dirty_ratio
to do smooth page reclaiming and to kick background-write-out.


> And it doesn't help for the problem described by Ted, expecially for the
> IO controller. The only way I see to correctly handle that case is to
> limit the rate of dirty pages per cgroup, accounting the dirty activity
> to the cgroup that firstly touched the page (and not the owner as
> intended by the memory controller).
> 
Owner of the page should know dirty ratio, too.

> And this should be probably strictly connected to the IO controller. If
> we throttle or delay the dispatching/submission of some IO requests
> without throttling the dirty pages rate a cgroup could completely waste
> its own available memory with dirty (hard and slow to reclaim) pages.
> 
> That is in part the approach I used in io-throttle v12, adding a hook in
> balance_dirty_pages_ratelimited_nr() to throttle the current task when
> cgroup's IO limit are exceeded. Argh!
> 
> So, another proposal could be to re-add in io-throttle v14 the old hook
> also in balance_dirty_pages_ratelimited_nr().
> 
> In this way io-throttle would:
> 
> - use page_cgroup infrastructure and page_cgroup->flags to encode the
>   cgroup id that firstly dirtied a generic page
> - account and opportunely throttle sync and writeback IO requests in
>   submit_bio()
> - at the same time throttle the tasks in
>   balance_dirty_pages_ratelimited_nr() if the cgroup they belong has
>   exhausted the IO BW (or quota, share, etc. in case of proportional BW
>   limit)
> 

IMHO, io-controller should just work as I/O subsystem as bdi. Now, per-bdi dirty_ratio
is suppoted and it seems to work well.  

Can't we write a function like  bdi_writeout_fraction() ?;
It will be a simple choice.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/