linux-kernel - Re: cfq-iosched preempt issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110302202118.GA2547@redhat.com>
Date:	Wed, 2 Mar 2011 15:21:18 -0500
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Shaohua Li <shaohua.li@...el.com>
Cc:	jaxboe@...ionio.com, jmoyer@...hat.com, czoccolo@...il.com,
	guijianfeng@...fujitsu.com, linux-kernel@...r.kernel.org
Subject: Re: cfq-iosched preempt issues

On Wed, Mar 02, 2011 at 08:43:41PM +0800, Shaohua Li wrote:
> queue preemption is good for some workloads and not for others. With commit
> f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
> 1. In a multi-threaded workload, each thread runs a random read/write (for
> example, mmap write) with iodepth 1. I found the queue depth gets smaller
> with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
> are waitting for write, and on the other hand, there are less threads doing
> read. This will make the queue depth small, so performance drops a little.
> So in this case, speed up write can speed up read too, but we can't detect
> it.
> 2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
> What about if there are two queues, one sync and one async? If the sync queue's
> think time is small, we can treat it as the sole queue, because the sync queue
> will preempt async queue, so we don't need care about the async queue's latency.
> The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.
> 
> Any idea?

CFQ is already very complicated, lets try to keep it simple. Because it
is complicated, making it hierarchical for cgroup becomes even harder.

IIUC, you are saying that cfqd->busy_queues check is not sufficient as
it takes async queues also in account.

So we can keep another count say, cfqd->busy_sync_queues and if there
are no busy_sync_queues, allow unlimited depth and that should be
a really simple few lines change.

But lets do it only if you have a real life workload. Similiarly we can
worry about RT case when there is a real workload behind it.

Thanks
Vivek
 

 


> 
> Thanks,
> Shaohua
> -----------------------------------------------
> Subject: cfq-iosched: don't limlit sync queue depth with only one such sync queue
> 
> If there are a sync and an async queue and the sync queue's think time is small,
> we can ignore the sync queue's dispatch quantum. Because the sync queue will
> always preempt the async queue, we don't need to care about async's latency.
> In the same way, we can optimize a RT queue too to improve performance.
> This can fix a performance regression of aiostress test, which is introduced by
> commit f8ae6e3eb825. The issue should exist even without the commit, but the
> commit amplifies the impact.
> 
> Signed-off-by: Shaohua Li <shaohua.li@...el.com>
> ---
>  block/cfq-iosched.c |   91 +++++++++++++++++++++++++++++++++++++++++-----------
>  1 file changed, 73 insertions(+), 18 deletions(-)
> 
> Index: linux/block/cfq-iosched.c
> ===================================================================
> --- linux.orig/block/cfq-iosched.c	2011-03-02 14:58:19.000000000 +0800
> +++ linux/block/cfq-iosched.c	2011-03-02 15:48:38.000000000 +0800
> @@ -1150,6 +1150,20 @@ void cfq_unlink_blkio_group(void *key, s
>  	spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
>  }
>  
> +static bool cfq_have_cfqgs(struct cfq_data *cfqd)
> +{
> +	struct hlist_node *pos;
> +	struct cfq_group *cfqg;
> +	int cnt = 0;
> +
> +	hlist_for_each_entry(cfqg, pos, &cfqd->cfqg_list, cfqd_node) {
> +		cnt++;
> +		if (cnt > 1)
> +			break;
> +	}
> +	return cnt > 1;
> +}
> +
>  #else /* GROUP_IOSCHED */
>  static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create)
>  {
> @@ -1169,6 +1183,12 @@ cfq_link_cfqq_cfqg(struct cfq_queue *cfq
>  static void cfq_release_cfq_groups(struct cfq_data *cfqd) {}
>  static inline void cfq_put_cfqg(struct cfq_group *cfqg) {}
>  
> +static inline bool cfq_have_cfqgs(struct cfq_data *cfqd)
> +{
> +	return false;
> +}
> +
> +
>  #endif /* GROUP_IOSCHED */
>  
>  /*
> @@ -2381,6 +2401,57 @@ static inline bool cfq_slice_used_soon(s
>  	return false;
>  }
>  
> +static unsigned int cfq_queue_max_quantum(struct cfq_data *cfqd,
> +	struct cfq_queue *cfqq)
> +{
> +	int sync = cfq_cfqq_sync(cfqq);
> +	enum wl_prio_t prio = cfqq_prio(cfqq);
> +	struct cfq_group *cfqg = cfqq->cfqg;
> +	int sync_queues_cnt, async_queues_cnt;
> +	struct cfq_io_context *cic = RQ_CIC(cfqq->next_rq);
> +
> +	/* Sole queue user, no limit */
> +	if (cfqd->busy_queues == 1)
> +		return -1;
> +
> +	if (cfq_have_cfqgs(cfqd) || (!sync && prio != RT_WORKLOAD))
> +		goto normal;
> +
> +	sync_queues_cnt = cfqg->service_trees[prio][SYNC_NOIDLE_WORKLOAD].count
> +		+ cfqg->service_trees[prio][SYNC_WORKLOAD].count;
> +	async_queues_cnt = cfqg->service_trees[prio][ASYNC_WORKLOAD].count;
> +	/*
> +	 * If a queue is a sole sync queue and think time is small, we can ignore
> +	 * async queue here and give the sync queue no dispatch limit, because a
> +	 * sync queue can preempt async queue.
> +	 *
> +	 * If the queue is RT, we don't need check BE, because even the
> +	 * queue is expired, the dispatcher will select RT queue again next time.
> +	 *
> +	 * If the queue is BE, we don't check RT here, because dispatcher will
> +	 * switch to RT next time, so we at most dispatch one extra request.
> +	 */
> +	if (((!sync && prio == RT_WORKLOAD && sync_queues_cnt == 0 &&
> +		async_queues_cnt == 1) || sync_queues_cnt == 1) &&
> +		sample_valid(cic->ttime_samples) &&
> +		cic->ttime_mean < cfqd->cfq_slice_idle)
> +		return -1;
> +normal:
> +	/*
> +	 * We have other queues, don't allow more IO from this one
> +	 */
> +	if (cfq_slice_used_soon(cfqd, cfqq))
> +		return 0;
> +	else
> +		/*
> +		 * Normally we start throttling cfqq when cfq_quantum/2
> +		 * requests have been dispatched. But we can drive
> +		 * deeper queue depths at the beginning of slice
> +		 * subjected to upper limit of cfq_quantum.
> +		 * */
> +		return cfqd->cfq_quantum;
> +}
> +
>  static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>  {
>  	unsigned int max_dispatch;
> @@ -2411,25 +2482,9 @@ static bool cfq_may_dispatch(struct cfq_
>  		if (cfq_class_idle(cfqq))
>  			return false;
>  
> -		/*
> -		 * We have other queues, don't allow more IO from this one
> -		 */
> -		if (cfqd->busy_queues > 1 && cfq_slice_used_soon(cfqd, cfqq))
> +		max_dispatch = cfq_queue_max_quantum(cfqd, cfqq);
> +		if (max_dispatch == 0)
>  			return false;
> -
> -		/*
> -		 * Sole queue user, no limit
> -		 */
> -		if (cfqd->busy_queues == 1)
> -			max_dispatch = -1;
> -		else
> -			/*
> -			 * Normally we start throttling cfqq when cfq_quantum/2
> -			 * requests have been dispatched. But we can drive
> -			 * deeper queue depths at the beginning of slice
> -			 * subjected to upper limit of cfq_quantum.
> -			 * */
> -			max_dispatch = cfqd->cfq_quantum;
>  	}
>  
>  	/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/