lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081113191546.GL14817@gandalf.sssup.it>
Date:	Thu, 13 Nov 2008 20:15:46 +0100
From:	Fabio Checconi <fchecconi@...il.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Nauman Rafique <nauman@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org,
	containers@...ts.linux-foundation.org,
	virtualization@...ts.linux-foundation.org, jens.axboe@...cle.com,
	Hirokazu Takahashi <taka@...inux.co.jp>,
	Ryo Tsuruta <ryov@...inux.co.jp>,
	Andrea Righi <righi.andrea@...il.com>,
	Satoshi UCHIDA <s-uchida@...jp.nec.com>,
	fernando@....ntt.co.jp, balbir@...ux.vnet.ibm.com,
	Andrew Morton <akpm@...ux-foundation.org>, menage@...gle.com,
	ngupta@...gle.com, Rik van Riel <riel@...hat.com>,
	Jeff Moyer <jmoyer@...hat.com>,
	"dpshah@...gle.com" <dpshah@...gle.com>,
	Mike Waychison <mikew@...gle.com>, rohitseth@...gle.com,
	paolo.valente@...more.it
Subject: Re: [patch 0/4] [RFC] Another proportional weight IO controller

> From: Vivek Goyal <vgoyal@...hat.com>
> Date: Thu, Nov 13, 2008 01:08:21PM -0500
>
> On Wed, Nov 12, 2008 at 01:20:13PM -0800, Nauman Rafique wrote:
...
> > I was thinking of a more cfq-like solution for proportional division
> > at the elevator level (i.e. not a token based solution). There are two
> > options for proportional bandwidth division at elevator level: 1)
> > change the size of the time slice in proportion to the weights or 2)
> > allocate equal time slice each time but allocate more slices to cgroup
> > with more weight. For (2), we can actually keep track of time taken to
> > serve requests and allocate time slices in such a way that the actual
> > disk time is proportional to the weight. We can adopt a fair-queuing
> > (http://lkml.org/lkml/2008/4/1/234) like approach for this if we want
> > to go that way.
> 
> Hi Nauman,
> 
> I think doing proportional weight division at elevator level will be
> little difficult, because if we go for a full hierarchical solution then
> we will be doing proportional weight division among tasks as well as
> groups.
> 
> For example, consider this. Assume at root level there are three tasks
> A, B, C and two cgroups D and E. Now for proportional weight division we
> should consider A, B, C, D and E at same level and then try to divide
> the BW (Thanks to peterz for clarifying this).
> 
> Other approach could be that consider A, B, C in root cgroup and then
> consider root, D and E competing groups and try to divide the BW. But
> this is not how cpu controller operates and this approach I think was
> initially implemented for group scheduling in cpu controller and later
> changed.
> 
> How the proportional weight division is done among tasks is a property
> of IO scheduler. cfq decides to use time slices according to priority
> and bfq decides to use tokens. So probably we can't move this to common
> elevator layer.
> 

cfq and bfq are pretty similar in the concepts they adopt, and the pure
time-based approach of cfq can be extended to arbitrary hierarchies.

Even in bfq, when dealing with groups that generate only seeky traffic
we don't try to be fair in the service domain, as it would decrease too
much the aggregate throughput, but we fall back to a time-based approach.

[ This is a design choice, but it does not depend on the algorithms,
  and of course can be changed... ]

The two approaches can be mixed/unified, for example, using wf2q+ to
schedule the slices, in the time domain, of cfq; the main remaining
difference would be the ability of bfq to provide service-domain
guarantees.


> I think Satoshi's cfq controller patches also do not seem to be considering
> A, B, C, D and E to be at same level, instead it treats cgroup "/" , D and E
> at same level and tries to do proportional BW division among these.
> Satoshi, please correct me, if that's not the case.
> 
> Above example, raises another question and that is what to do wih IO
> schedulers which do not differentiate between tasks. For example, noop. It
> simply has got one single linked list and does not have the notion of
> io context and does not differentiate between IO coming from different
> tasks. In that case probably we have no choice but to group A, B, C's bio
> in root cgroup and do proportional weight division among "root", D and E
> groups. I have not looked at deadline and AS yet.
> 

When you talk about grouping tasks into the root cgroup and then
scheduling inside the groups using an existing scheduler, do you mean
doing something like creating a ``little as'' or ``little noop'' queue
per each group, somehow like what happens with classless leaf qdiscs in
network scheduling, and then select first the leaf group to be scheduled,
and then using the per-leaf scheduler to select the request from the leaf?

A good thing about this approach would be that idling would still make
sense and the upper infrastructure would be the same for all the schedulers
(except for cfq and bfq, that in my opinion better fit the cpu scheduler's
hierarchical approach, with hierarchies confined into scheduling classes).


> So at this point of time I think that probably porting BFQ's hierarchical
> scheduling implementation to other IO schedulers might make sense. Thoughts?
> 

IMO for cfq, given the similarities, this can be done without conceptual
problems.  How to do that for schedulers like as, noop or deadline, and
if this is the best solution, is an interesting problem :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ