linux-kernel - Re: [RFC] writeback and cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 4 Apr 2012 14:42:28 -0700
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Tejun Heo <tj@...nel.org>, Jan Kara <jack@...e.cz>,
	Jens Axboe <axboe@...nel.dk>, linux-mm@...ck.org,
	sjayaraman@...e.com, andrea@...terlinux.com, jmoyer@...hat.com,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	kamezawa.hiroyu@...fujitsu.com, lizefan@...wei.com,
	containers@...ts.linux-foundation.org, cgroups@...r.kernel.org,
	ctalbott@...gle.com, rni@...gle.com, lsf@...ts.linux-foundation.org
Subject: Re: [RFC] writeback and cgroup

On Wed, Apr 04, 2012 at 02:35:29PM -0400, Vivek Goyal wrote:
> On Wed, Apr 04, 2012 at 10:51:24AM -0700, Fengguang Wu wrote:
> 
> [..]
> > The sweet split point would be for balance_dirty_pages() to do cgroup
> > aware buffered write throttling and leave other IOs to the current
> > blkcg. For this to work well as a total solution for end users, I hope
> > we can cooperate and figure out ways for the two throttling entities
> > to work well with each other.
> 
> Throttling read + direct IO, higher up has few issues too. Users will

Yeah I have a bit worry about high layer throttling, too.
Anyway here are the ideas.

> not like that a task got blocked as it tried to submit a read from a
> throttled group.

That's not the same issue I worried about :) Throttling is about
inserting small sleep/waits into selected points. For reads, the ideal
sleep point is immediately after readahead IO is summited, at the end
of __do_page_cache_readahead(). The same should be applicable to
direct IO.

> Current async behavior works well where we queue up the
> bio from the task in throttled group and let task do other things. Same
> is true for AIO where we would not like to block in bio submission.

For AIO, we'll need to delay the IO completion notification or status
update, which may involve computing some delay time and delay the
calls to io_complete() with the help of some delayed work queue. There
may be more issues to deal with as I didn't look into aio.c carefully.

The thing worried me is that in the proportional throttling case, the
high level throttling works on the *estimated* task_ratelimit =
disk_bandwidth / N, where N is the number of read IO tasks. When N
suddenly changes from 2 to 1, it may take 1 second for the estimated
task_ratelimit to adapt from disk_bandwidth/2 up to disk_bandwidth,
during which time the disk won't get 100% utilized because of the
temporally over-throttling of the remaining IO task.

This is not a problem when throttling at the block/cfq layer, since it
has the full information of pending requests and should not depend on
such estimations.

The workaround I can think of, is to put the throttled task into a wait
queue, and let block layer wake up the waiters when the IO queue runs
empty. This should be able to avoid most disk idle time.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/