lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1240908234-15434-1-git-send-email-righi.andrea@gmail.com>
Date:	Tue, 28 Apr 2009 10:43:47 +0200
From:	Andrea Righi <righi.andrea@...il.com>
To:	Paul Menage <menage@...gle.com>
Cc:	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Gui Jianfeng <guijianfeng@...fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	agk@...rceware.org, akpm@...ux-foundation.org, axboe@...nel.dk,
	tytso@....edu, baramsori72@...il.com,
	Carl Henrik Lunde <chlunde@...g.uio.no>,
	dave@...ux.vnet.ibm.com, Divyesh Shah <dpshah@...gle.com>,
	eric.rannaud@...il.com, fernando@....ntt.co.jp,
	Hirokazu Takahashi <taka@...inux.co.jp>,
	Li Zefan <lizf@...fujitsu.com>, matt@...ehost.com,
	dradford@...ehost.com, ngupta@...gle.com, randy.dunlap@...cle.com,
	roberto@...it.it, Ryo Tsuruta <ryov@...inux.co.jp>,
	Satoshi UCHIDA <s-uchida@...jp.nec.com>,
	subrata@...ux.vnet.ibm.com, yoshikawa.takuya@....ntt.co.jp,
	Nauman Rafique <nauman@...gle.com>, fchecconi@...il.com,
	paolo.valente@...more.it, m-ikeda@...jp.nec.com,
	paulmck@...ux.vnet.ibm.com, containers@...ts.linux-foundation.org,
	linux-kernel@...r.kernel.org
Subject: [PATCH v15 0/7] cgroup: io-throttle controller

Objective
~~~~~~~~~
The objective of the io-throttle controller is to improve IO performance
predictability of different cgroups that share the same block devices.

State of the art (quick overview)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A recent work made by Vivek propose a weighted BW solution introducing
fair queuing support in the elevator layer and modifying the existent IO
schedulers to use that functionality
(https://lists.linux-foundation.org/pipermail/containers/2009-March/016129.html).

For the fair queuing part Vivek's IO controller makes use of the BFQ
code as posted by Paolo and Fabio (http://lkml.org/lkml/2008/11/11/148).

The dm-ioband controller by the valinux guys is also proposing a
proportional ticket-based solution fully implemented at the device
mapper level (http://people.valinux.co.jp/~ryov/dm-ioband/).

The bio-cgroup patch (http://people.valinux.co.jp/~ryov/bio-cgroup/) is
a BIO tracking mechanism for cgroups, implemented in the cgroup memory
subsystem. It is maintained by Ryo and it allows dm-ioband to track
writeback requests issued by kernel threads (pdflush).

Another work by Satoshi implements the cgroup awareness in CFQ, mapping
per-cgroup priority to CFQ IO priorities and this also provide only the
proportional BW support (http://lwn.net/Articles/306772/).

Please correct me or integrate if I missed someone or something. :)

Proposed solution
~~~~~~~~~~~~~~~~~
Respect to other priority/weight-based solutions the approach used by
this controller is to explicitly choke applications' requests that
directly or indirectly generate IO activity in the system (this
controller addresses both synchronous IO and writeback/buffered IO).

The bandwidth and iops limiting method has the advantage of improving
the performance predictability at the cost of reducing, in general, the
overall performance of the system in terms of throughput.

IO throttling and accounting is performed during the submission of IO
requests and it is independent of the particular IO scheduler.

Detailed informations about design, goal and usage are described in the
documentation (see [PATCH 1/7]).

Implementation
~~~~~~~~~~~~~~
Patchset against latest Linus' git:

  [PATCH v15 0/7] cgroup: block device IO controller
  [PATCH v15 1/7] io-throttle documentation
  [PATCH v15 2/7] res_counter: introduce ratelimiting attributes
  [PATCH v15 3/7] page_cgroup: provide a generic page tracking infrastructure
  [PATCH v15 4/7] io-throttle controller infrastructure
  [PATCH v15 5/7] kiothrottled: throttle buffered (writeback) IO
  [PATCH v15 6/7] io-throttle instrumentation
  [PATCH v15 7/7] io-throttle: export per-task statistics to userspace

The v15 all-in-one patch, along with the previous versions, can be found at:
http://download.systemimager.org/~arighi/linux/patches/io-throttle/

Changelog (v14 -> v15)
~~~~~~~~~~~~~~~~~~~~~~
* performance optimization for direct IO (O_DIRECT): in submit_bio() instead of
  checking if the bio has been generated by the current task using the slow
  get_iothrottle_from_bio(), use the faster is_in_dio(), that simply check the
  value of task_struct->in_dio, set before submitting O_DIRECT requests and
  unset for.
* block tasks that have exceeded the cgroup limits also in
  balance_dirty_pages_ratelimited_nr(): when the submission of IO requests is
  blocked by io-throttle we also want to throttle the dirty page rate, to reduce
  the generation of hard reclaimable dirty pages in the system and prevent
  potential OOM conditions
* explicitly check if cgroup_lock() is held in the iothrottle block device list
  (suggested by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>)
* fixed a build bug in page_cgroup.c when CONFIG_SPARSEMEM was not set
  (reported by: Gui Jianfeng <guijianfeng@...fujitsu.com>)
* small styling fixes in res_counter

Overall diffstat
~~~~~~~~~~~~~~~~
 Documentation/cgroups/io-throttle.txt |  417 ++++++++++++++++
 block/Makefile                        |    1 +
 block/blk-core.c                      |    8 +
 block/blk-io-throttle.c               |  851 +++++++++++++++++++++++++++++++++
 block/kiothrottled.c                  |  341 +++++++++++++
 fs/aio.c                              |   12 +
 fs/buffer.c                           |    2 +
 fs/direct-io.c                        |    3 +
 fs/proc/base.c                        |   18 +
 include/linux/blk-io-throttle.h       |  168 +++++++
 include/linux/cgroup.h                |    1 +
 include/linux/cgroup_subsys.h         |    6 +
 include/linux/memcontrol.h            |    6 +
 include/linux/mmzone.h                |    4 +-
 include/linux/page_cgroup.h           |   33 ++-
 include/linux/res_counter.h           |   69 ++-
 include/linux/sched.h                 |    8 +
 init/Kconfig                          |   16 +
 kernel/cgroup.c                       |    9 +
 kernel/fork.c                         |    8 +
 kernel/res_counter.c                  |   73 +++
 mm/Makefile                           |    3 +-
 mm/bounce.c                           |    2 +
 mm/filemap.c                          |    2 +
 mm/memcontrol.c                       |    6 +
 mm/page-writeback.c                   |   13 +
 mm/page_cgroup.c                      |   96 ++++-
 mm/readahead.c                        |    3 +
 28 files changed, 2145 insertions(+), 34 deletions(-)

-Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ