linux-kernel - Re: [RFC] Block IO Controller V2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091118153227.GA5796@redhat.com>
Date:	Wed, 18 Nov 2009 10:32:27 -0500
From:	Vivek Goyal <vgoyal@...hat.com>
To:	"Alan D. Brunelle" <Alan.Brunelle@...com>
Cc:	linux-kernel@...r.kernel.org, jens.axboe@...cle.com,
	Corrado Zoccolo <czoccolo@...il.com>
Subject: Re: [RFC] Block IO Controller V2 - some results

On Tue, Nov 17, 2009 at 07:38:47AM -0500, Alan D. Brunelle wrote:
> On Mon, 2009-11-16 at 17:18 -0500, Vivek Goyal wrote:
> > On Mon, Nov 16, 2009 at 03:51:00PM -0500, Alan D. Brunelle wrote:
> > 
> > [..]
> > > ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
> > > 
> > > The next thing to look at is to see what the "penalty" is for the
> > > additional code: see how much bandwidth we lose for the capability
> > > added. Here we see the sum of the system's throughput for the various
> > > tests:
> > > 
> > > ---- ---- - ----------- ----------- ----------- ----------- 
> > > Mode RdWr N    base       ioc off   ioc no idle  ioc idle   
> > > ---- ---- - ----------- ----------- ----------- ----------- 
> > >  rnd   rd 2        17.3        17.1         9.4         9.1 
> > >  rnd   rd 4        27.1        27.1         8.1         8.2 
> > >  rnd   rd 8        37.1        37.1         6.8         7.1 
> > > 
> > 

Hi Alan,

I got hold of a better system where few disks are striped. On this system
I can also see the performance drop and it does come from the fact that
we idle on empty sync-noidle service tree. So with multiple groups we
increase the instances of sync-noidle trees hence more idling and hence
reduced throughput.

With this patch, I had done a little optimization where I don't idle on
a sync-noidle service tree if there are no competing sync-idle or async
queues.

What that means is that if a group is doing only random IO and it does not
have sufficient IO to keep disk busy, then we will move onto next group
and start dispatching from that. So in your tests, with this patch you 
should see better results for random reads with group_idle=0. With
group_idle=1, results will still remain same as we continue to idle on
sync-noidle service tree.

It is working for me. Can you please try it out. You can just apply this
patch on top of V3. (You don't have to apply hw_tag patch from corrodo).

Thanks
Vivek


o Now we don't remove the queue from service tree until we expire it, even if
  queue is empty. So st->count = 0 is not a valid state while we have a cfqq
  at hand. Fix it in arm_slice_timer().

o wait_busy gets set only if group is emtpy. If st->count > 1, then group is
  not empty and wait_busy will not be set. Remove that extra check.

o There is no need to idle on aysnc service tree as it is backlogged most of
  the time if writes are on. Those queues don't get deleted hence don't wait
  on async service tree. Similiarly don't wait on sync-idle service tree as
  we do idling on individual queues if need be. Fixed cfq_should_idle().

o Currently we wait on sync-noidle service tree so that sync-noidle type of
  workload does not get swamped by sync-idle or async type of workload. Don't
  do this idling if there are no sync-idle or async type of queues in the group
  and there are other groups to dispatch the requests from and user has decided
  not to wait on slow groups to achieve better throughput. (group_idle=0).

  This will make sure if some group is doing just random IO and does not
  have sufficient IO to keep the disk busy, we will move onto other groups to
  dispatch the requests from and utilize the storage better. 

Signed-off-by: Vivek Goyal <vgoyal@...hat.com>
---
 block/cfq-iosched.c |   35 +++++++++++++++++++----------------
 1 file changed, 19 insertions(+), 16 deletions(-)

Index: linux6/block/cfq-iosched.c
===================================================================
--- linux6.orig/block/cfq-iosched.c	2009-11-17 14:44:09.000000000 -0500
+++ linux6/block/cfq-iosched.c	2009-11-18 10:09:33.000000000 -0500
@@ -899,6 +899,10 @@ static bool cfq_should_idle(struct cfq_d
 {
 	enum wl_prio_t prio = cfqq_prio(cfqq);
 	struct cfq_rb_root *service_tree = cfqq->service_tree;
+	struct cfq_group *cfqg = cfqq->cfqg;
+
+	BUG_ON(!service_tree);
+	BUG_ON(!service_tree->count);
 
 	/* We never do for idle class queues. */
 	if (prio == IDLE_WORKLOAD)
@@ -908,18 +912,18 @@ static bool cfq_should_idle(struct cfq_d
 	if (cfq_cfqq_idle_window(cfqq))
 		return true;
 
+	/* Don't idle on async and sync-idle service trees */
+	if (cfqd->serving_type != SYNC_NOIDLE_WORKLOAD)
+		return false;
 	/*
-	 * Otherwise, we do only if they are the last ones
-	 * in their service tree.
+	 * If there are other competing groups present, don't wait on service
+	 * tree if this is last queue in the group and there are no other
+	 * competing queues (sync-idle or async) queues present
 	 */
-	if (!service_tree)
-		service_tree = service_tree_for(cfqq->cfqg, prio,
-						cfqq_type(cfqq), cfqd);
-
-	if (service_tree->count == 0)
-		return true;
-
-	return (service_tree->count == 1 && cfq_rb_first(service_tree) == cfqq);
+	if (cfqd->nr_groups > 1 && !cfqd->cfq_group_idle)
+		return (service_tree->count == 1 && cfqg->nr_cfqq > 1);
+	else
+		return service_tree->count == 1;
 }
 
 #ifdef CONFIG_CFQ_GROUP_IOSCHED
@@ -1102,9 +1106,6 @@ static inline bool cfqq_should_wait_busy
 	if (!RB_EMPTY_ROOT(&cfqq->sort_list) || cfqq->cfqg->nr_cfqq > 1)
 		return false;
 
-	if (!cfq_should_idle(cfqq->cfqd, cfqq))
-		return false;
-
 	return true;
 }
 
@@ -1801,7 +1802,10 @@ static bool cfq_arm_slice_timer(struct c
 	/*
 	 * idle is disabled, either manually or by past process history
 	 */
-	if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq))
+	if (!cfqd->cfq_slice_idle)
+		return false;
+
+	if (!cfq_should_idle(cfqd, cfqq) && !wait_busy)
 		return false;
 
 	/*
@@ -1837,8 +1841,7 @@ static bool cfq_arm_slice_timer(struct c
 	 */
 	st = service_tree_for(cfqq->cfqg, cfqd->serving_prio,
 				SYNC_NOIDLE_WORKLOAD, cfqd);
-	if (!wait_busy && cfqd->serving_type == SYNC_NOIDLE_WORKLOAD
-	    && st->count > 0) {
+	if (cfqd->serving_type == SYNC_NOIDLE_WORKLOAD && st->count > 1) {
 		if (blk_queue_nonrot(cfqd->queue) || cfqd->hw_tag)
 			return false;
 		sl = min(sl, msecs_to_jiffies(CFQ_MIN_TT));
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/