linux-kernel - Re: [PATCH 15/28] io-controller: Allow CFQ specific extra preemptions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4ABC6222.9090103@cn.fujitsu.com>
Date:	Fri, 25 Sep 2009 14:24:34 +0800
From:	Gui Jianfeng <guijianfeng@...fujitsu.com>
To:	Vivek Goyal <vgoyal@...hat.com>
CC:	linux-kernel@...r.kernel.org, jens.axboe@...cle.com,
	containers@...ts.linux-foundation.org, dm-devel@...hat.com,
	nauman@...gle.com, dpshah@...gle.com, lizf@...fujitsu.com,
	mikew@...gle.com, fchecconi@...il.com, paolo.valente@...more.it,
	ryov@...inux.co.jp, fernando@....ntt.co.jp, s-uchida@...jp.nec.com,
	taka@...inux.co.jp, jmoyer@...hat.com, dhaval@...ux.vnet.ibm.com,
	balbir@...ux.vnet.ibm.com, righi.andrea@...il.com,
	m-ikeda@...jp.nec.com, agk@...hat.com, akpm@...ux-foundation.org,
	peterz@...radead.org, jmarchan@...hat.com,
	torvalds@...ux-foundation.org, mingo@...e.hu, riel@...hat.com,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [PATCH 15/28] io-controller: Allow CFQ specific extra preemptions

Vivek Goyal wrote:
> o CFQ allows a reader preemting a writer. So far we allow this with-in group
>   but not across groups. But there seems to be following special case where
>   this preemption might make sense.
> 
> 			root
> 			/  \
> 		       R   Group
> 			     |
> 			     W
> 
>   Now here reader should be able to preempt the writer. Think of there are
>   10 groups each running a writer and an admin trying to do "ls" and he
>   experiences suddenly high latencies for ls.

  Hi Vivek, 

  This preemption might be unfair to the readers who stay in the same group with
  writer. Consider the following:

                     root
                     /  \
                    R1  Group
                        /  \
                       R2   W

  Say W is running and late preemption is enabled, then a request goes into R1,
  R1 will preempt W immediately regardless of R2. Now R2 don't have a chance to
  get scheduled even if R1 has a very high vdisktime. It seems not so fair to R2.
  So I suggest the number of readers in group should be taken into account when
  making this preemption decision. R1 should only preempts W when there are not 
  any readers in that group.

  Thanks,
  Gui Jianfeng

> 
>   Same is true for meta data requests. If there is a meta data request and
>   a reader is running inside a sibling group, preemption will be allowed.
>   Note, following is not allowed.
> 			root
> 			/  \
> 	            group1 group2
> 		      |      |
> 	              R	     W
> 
>   Here reader can't preempt writer.
> 
> o Put meta data requesting queues at the front of the service tree. Generally
>   such queues will preempt currently running queue but not in following case.
> 			root
> 			/  \
> 	            group1 group2
> 		      |     / \
> 	              R1   R3  R2 (meta data)
> 
>  Here R2 is having a meta data request but it will not preempt R1. We need
>  to make sure that R2 gets queued ahead of R3 so taht once group2 gets
>  going, we first service R2 and then R3 and not vice versa.
> 
> Signed-off-by: Vivek Goyal <vgoyal@...hat.com>
> ---
>  block/elevator-fq.c |   47 +++++++++++++++++++++++++++++++++++++++++++++--
>  block/elevator-fq.h |    3 +++
>  2 files changed, 48 insertions(+), 2 deletions(-)
> 
> diff --git a/block/elevator-fq.c b/block/elevator-fq.c
> index 25beaf7..8ff8a19 100644
> --- a/block/elevator-fq.c
> +++ b/block/elevator-fq.c
> @@ -701,6 +701,7 @@ static void enqueue_io_entity(struct io_entity *entity)
>  	struct io_service_tree *st;
>  	struct io_sched_data *sd = io_entity_sched_data(entity);
>  	struct io_queue *ioq = ioq_of(entity);
> +	int add_front = 0;
>  
>  	if (entity->on_idle_st)
>  		dequeue_io_entity_idle(entity);
> @@ -716,12 +717,22 @@ static void enqueue_io_entity(struct io_entity *entity)
>  	st = entity->st;
>  	st->nr_active++;
>  	sd->nr_active++;
> +
>  	/* Keep a track of how many sync queues are backlogged on this group */
>  	if (ioq && elv_ioq_sync(ioq) && !elv_ioq_class_idle(ioq))
>  		sd->nr_sync++;
>  	entity->on_st = 1;
> -	place_entity(st, entity, 0);
> -	__enqueue_io_entity(st, entity, 0);
> +
> +	/*
> +	 * If a meta data request is pending in this queue, put this
> +	 * queue at the front so that it gets a chance to run first
> +	 * as soon as the associated group becomes eligbile to run.
> +	 */
> +	if (ioq && ioq->meta_pending)
> +		add_front = 1;
> +
> +	place_entity(st, entity, add_front);
> +	__enqueue_io_entity(st, entity, add_front);
>  	debug_update_stats_enqueue(entity);
>  }
>  
> @@ -2280,6 +2291,31 @@ static int elv_should_preempt(struct request_queue *q, struct io_queue *new_ioq,
>  		return 1;
>  
>  	/*
> +	 * Allow some additional preemptions where a reader queue gets
> +	 * backlogged and some writer queue is running under any of the
> +	 * sibling groups.
> +	 *
> +	 * 		     root
> +	 * 		     /  \
> +	 * 		    R  group
> +	 * 			 |
> +	 * 			 W
> +	 */
> +
> +	if (ioq_of(new_entity) == new_ioq  && iog_of(entity)) {
> +		/* Let reader queue preempt writer in sibling group */
> +		if (elv_ioq_sync(new_ioq) && !elv_ioq_sync(active_ioq))
> +			return 1;
> +		/*
> +		 * So both queues are sync. Let the new request get disk time if
> +		 * it's a metadata request and the current queue is doing
> +		 * regular IO.
> +		 */
> +		if (new_ioq->meta_pending && !active_ioq->meta_pending)
> +			return 1;
> +	}
> +
> +	/*
>  	 * If both the queues belong to same group, check with io scheduler
>  	 * if it has additional criterion based on which it wants to
>  	 * preempt existing queue.
> @@ -2335,6 +2371,8 @@ void elv_ioq_request_add(struct request_queue *q, struct request *rq)
>  	BUG_ON(!efqd);
>  	BUG_ON(!ioq);
>  	ioq->nr_queued++;
> +	if (rq_is_meta(rq))
> +		ioq->meta_pending++;
>  	elv_log_ioq(efqd, ioq, "add rq: rq_queued=%d", ioq->nr_queued);
>  
>  	if (!elv_ioq_busy(ioq))
> @@ -2669,6 +2707,11 @@ void elv_ioq_request_removed(struct elevator_queue *e, struct request *rq)
>  	ioq = rq->ioq;
>  	BUG_ON(!ioq);
>  	ioq->nr_queued--;
> +
> +	if (rq_is_meta(rq)) {
> +		WARN_ON(!ioq->meta_pending);
> +		ioq->meta_pending--;
> +	}
>  }
>  
>  /* A request got dispatched. Do the accounting. */
> diff --git a/block/elevator-fq.h b/block/elevator-fq.h
> index 2992d93..27ff5c4 100644
> --- a/block/elevator-fq.h
> +++ b/block/elevator-fq.h
> @@ -100,6 +100,9 @@ struct io_queue {
>  
>  	/* Pointer to io scheduler's queue */
>  	void *sched_queue;
> +
> +	/* pending metadata requests */
> +	int meta_pending;
>  };
>  
>  #ifdef CONFIG_GROUP_IOSCHED /* CONFIG_GROUP_IOSCHED */

-- 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/