[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1240908234-15434-1-git-send-email-righi.andrea@gmail.com>
Date: Tue, 28 Apr 2009 10:43:47 +0200
From: Andrea Righi <righi.andrea@...il.com>
To: Paul Menage <menage@...gle.com>
Cc: Balbir Singh <balbir@...ux.vnet.ibm.com>,
Gui Jianfeng <guijianfeng@...fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
agk@...rceware.org, akpm@...ux-foundation.org, axboe@...nel.dk,
tytso@....edu, baramsori72@...il.com,
Carl Henrik Lunde <chlunde@...g.uio.no>,
dave@...ux.vnet.ibm.com, Divyesh Shah <dpshah@...gle.com>,
eric.rannaud@...il.com, fernando@....ntt.co.jp,
Hirokazu Takahashi <taka@...inux.co.jp>,
Li Zefan <lizf@...fujitsu.com>, matt@...ehost.com,
dradford@...ehost.com, ngupta@...gle.com, randy.dunlap@...cle.com,
roberto@...it.it, Ryo Tsuruta <ryov@...inux.co.jp>,
Satoshi UCHIDA <s-uchida@...jp.nec.com>,
subrata@...ux.vnet.ibm.com, yoshikawa.takuya@....ntt.co.jp,
Nauman Rafique <nauman@...gle.com>, fchecconi@...il.com,
paolo.valente@...more.it, m-ikeda@...jp.nec.com,
paulmck@...ux.vnet.ibm.com, containers@...ts.linux-foundation.org,
linux-kernel@...r.kernel.org
Subject: [PATCH v15 0/7] cgroup: io-throttle controller
Objective
~~~~~~~~~
The objective of the io-throttle controller is to improve IO performance
predictability of different cgroups that share the same block devices.
State of the art (quick overview)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A recent work made by Vivek propose a weighted BW solution introducing
fair queuing support in the elevator layer and modifying the existent IO
schedulers to use that functionality
(https://lists.linux-foundation.org/pipermail/containers/2009-March/016129.html).
For the fair queuing part Vivek's IO controller makes use of the BFQ
code as posted by Paolo and Fabio (http://lkml.org/lkml/2008/11/11/148).
The dm-ioband controller by the valinux guys is also proposing a
proportional ticket-based solution fully implemented at the device
mapper level (http://people.valinux.co.jp/~ryov/dm-ioband/).
The bio-cgroup patch (http://people.valinux.co.jp/~ryov/bio-cgroup/) is
a BIO tracking mechanism for cgroups, implemented in the cgroup memory
subsystem. It is maintained by Ryo and it allows dm-ioband to track
writeback requests issued by kernel threads (pdflush).
Another work by Satoshi implements the cgroup awareness in CFQ, mapping
per-cgroup priority to CFQ IO priorities and this also provide only the
proportional BW support (http://lwn.net/Articles/306772/).
Please correct me or integrate if I missed someone or something. :)
Proposed solution
~~~~~~~~~~~~~~~~~
Respect to other priority/weight-based solutions the approach used by
this controller is to explicitly choke applications' requests that
directly or indirectly generate IO activity in the system (this
controller addresses both synchronous IO and writeback/buffered IO).
The bandwidth and iops limiting method has the advantage of improving
the performance predictability at the cost of reducing, in general, the
overall performance of the system in terms of throughput.
IO throttling and accounting is performed during the submission of IO
requests and it is independent of the particular IO scheduler.
Detailed informations about design, goal and usage are described in the
documentation (see [PATCH 1/7]).
Implementation
~~~~~~~~~~~~~~
Patchset against latest Linus' git:
[PATCH v15 0/7] cgroup: block device IO controller
[PATCH v15 1/7] io-throttle documentation
[PATCH v15 2/7] res_counter: introduce ratelimiting attributes
[PATCH v15 3/7] page_cgroup: provide a generic page tracking infrastructure
[PATCH v15 4/7] io-throttle controller infrastructure
[PATCH v15 5/7] kiothrottled: throttle buffered (writeback) IO
[PATCH v15 6/7] io-throttle instrumentation
[PATCH v15 7/7] io-throttle: export per-task statistics to userspace
The v15 all-in-one patch, along with the previous versions, can be found at:
http://download.systemimager.org/~arighi/linux/patches/io-throttle/
Changelog (v14 -> v15)
~~~~~~~~~~~~~~~~~~~~~~
* performance optimization for direct IO (O_DIRECT): in submit_bio() instead of
checking if the bio has been generated by the current task using the slow
get_iothrottle_from_bio(), use the faster is_in_dio(), that simply check the
value of task_struct->in_dio, set before submitting O_DIRECT requests and
unset for.
* block tasks that have exceeded the cgroup limits also in
balance_dirty_pages_ratelimited_nr(): when the submission of IO requests is
blocked by io-throttle we also want to throttle the dirty page rate, to reduce
the generation of hard reclaimable dirty pages in the system and prevent
potential OOM conditions
* explicitly check if cgroup_lock() is held in the iothrottle block device list
(suggested by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>)
* fixed a build bug in page_cgroup.c when CONFIG_SPARSEMEM was not set
(reported by: Gui Jianfeng <guijianfeng@...fujitsu.com>)
* small styling fixes in res_counter
Overall diffstat
~~~~~~~~~~~~~~~~
Documentation/cgroups/io-throttle.txt | 417 ++++++++++++++++
block/Makefile | 1 +
block/blk-core.c | 8 +
block/blk-io-throttle.c | 851 +++++++++++++++++++++++++++++++++
block/kiothrottled.c | 341 +++++++++++++
fs/aio.c | 12 +
fs/buffer.c | 2 +
fs/direct-io.c | 3 +
fs/proc/base.c | 18 +
include/linux/blk-io-throttle.h | 168 +++++++
include/linux/cgroup.h | 1 +
include/linux/cgroup_subsys.h | 6 +
include/linux/memcontrol.h | 6 +
include/linux/mmzone.h | 4 +-
include/linux/page_cgroup.h | 33 ++-
include/linux/res_counter.h | 69 ++-
include/linux/sched.h | 8 +
init/Kconfig | 16 +
kernel/cgroup.c | 9 +
kernel/fork.c | 8 +
kernel/res_counter.c | 73 +++
mm/Makefile | 3 +-
mm/bounce.c | 2 +
mm/filemap.c | 2 +
mm/memcontrol.c | 6 +
mm/page-writeback.c | 13 +
mm/page_cgroup.c | 96 ++++-
mm/readahead.c | 3 +
28 files changed, 2145 insertions(+), 34 deletions(-)
-Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists