linux-kernel - cfq-iosched preempt issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110302124341.GA23940@sli10-conroe.sh.intel.com>
Date:	Wed, 2 Mar 2011 20:43:41 +0800
From:	Shaohua Li <shaohua.li@...el.com>
To:	jaxboe@...ionio.com, vgoyal@...hat.com, jmoyer@...hat.com,
	czoccolo@...il.com, guijianfeng@...fujitsu.com
Cc:	linux-kernel@...r.kernel.org
Subject: cfq-iosched preempt issues

queue preemption is good for some workloads and not for others. With commit
f8ae6e3eb825, the impact is amplified. I currently have two issues with it:
1. In a multi-threaded workload, each thread runs a random read/write (for
example, mmap write) with iodepth 1. I found the queue depth gets smaller
with commit f8ae6e3eb825. The reason is write gets preempted, so more threads
are waitting for write, and on the other hand, there are less threads doing
read. This will make the queue depth small, so performance drops a little.
So in this case, speed up write can speed up read too, but we can't detect
it.
2. cfq_may_dispatch doesn't limit queue depth if the queue is the sole queue.
What about if there are two queues, one sync and one async? If the sync queue's
think time is small, we can treat it as the sole queue, because the sync queue
will preempt async queue, so we don't need care about the async queue's latency.
The issue exists before, but f8ae6e3eb825 amplifies it. Below is a patch for it.

Any idea?

Thanks,
Shaohua
-----------------------------------------------
Subject: cfq-iosched: don't limlit sync queue depth with only one such sync queue

If there are a sync and an async queue and the sync queue's think time is small,
we can ignore the sync queue's dispatch quantum. Because the sync queue will
always preempt the async queue, we don't need to care about async's latency.
In the same way, we can optimize a RT queue too to improve performance.
This can fix a performance regression of aiostress test, which is introduced by
commit f8ae6e3eb825. The issue should exist even without the commit, but the
commit amplifies the impact.

Signed-off-by: Shaohua Li <shaohua.li@...el.com>
---
 block/cfq-iosched.c |   91 +++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 73 insertions(+), 18 deletions(-)

Index: linux/block/cfq-iosched.c
===================================================================
--- linux.orig/block/cfq-iosched.c	2011-03-02 14:58:19.000000000 +0800
+++ linux/block/cfq-iosched.c	2011-03-02 15:48:38.000000000 +0800
@@ -1150,6 +1150,20 @@ void cfq_unlink_blkio_group(void *key, s
 	spin_unlock_irqrestore(cfqd->queue->queue_lock, flags);
 }
 
+static bool cfq_have_cfqgs(struct cfq_data *cfqd)
+{
+	struct hlist_node *pos;
+	struct cfq_group *cfqg;
+	int cnt = 0;
+
+	hlist_for_each_entry(cfqg, pos, &cfqd->cfqg_list, cfqd_node) {
+		cnt++;
+		if (cnt > 1)
+			break;
+	}
+	return cnt > 1;
+}
+
 #else /* GROUP_IOSCHED */
 static struct cfq_group *cfq_get_cfqg(struct cfq_data *cfqd, int create)
 {
@@ -1169,6 +1183,12 @@ cfq_link_cfqq_cfqg(struct cfq_queue *cfq
 static void cfq_release_cfq_groups(struct cfq_data *cfqd) {}
 static inline void cfq_put_cfqg(struct cfq_group *cfqg) {}
 
+static inline bool cfq_have_cfqgs(struct cfq_data *cfqd)
+{
+	return false;
+}
+
+
 #endif /* GROUP_IOSCHED */
 
 /*
@@ -2381,6 +2401,57 @@ static inline bool cfq_slice_used_soon(s
 	return false;
 }
 
+static unsigned int cfq_queue_max_quantum(struct cfq_data *cfqd,
+	struct cfq_queue *cfqq)
+{
+	int sync = cfq_cfqq_sync(cfqq);
+	enum wl_prio_t prio = cfqq_prio(cfqq);
+	struct cfq_group *cfqg = cfqq->cfqg;
+	int sync_queues_cnt, async_queues_cnt;
+	struct cfq_io_context *cic = RQ_CIC(cfqq->next_rq);
+
+	/* Sole queue user, no limit */
+	if (cfqd->busy_queues == 1)
+		return -1;
+
+	if (cfq_have_cfqgs(cfqd) || (!sync && prio != RT_WORKLOAD))
+		goto normal;
+
+	sync_queues_cnt = cfqg->service_trees[prio][SYNC_NOIDLE_WORKLOAD].count
+		+ cfqg->service_trees[prio][SYNC_WORKLOAD].count;
+	async_queues_cnt = cfqg->service_trees[prio][ASYNC_WORKLOAD].count;
+	/*
+	 * If a queue is a sole sync queue and think time is small, we can ignore
+	 * async queue here and give the sync queue no dispatch limit, because a
+	 * sync queue can preempt async queue.
+	 *
+	 * If the queue is RT, we don't need check BE, because even the
+	 * queue is expired, the dispatcher will select RT queue again next time.
+	 *
+	 * If the queue is BE, we don't check RT here, because dispatcher will
+	 * switch to RT next time, so we at most dispatch one extra request.
+	 */
+	if (((!sync && prio == RT_WORKLOAD && sync_queues_cnt == 0 &&
+		async_queues_cnt == 1) || sync_queues_cnt == 1) &&
+		sample_valid(cic->ttime_samples) &&
+		cic->ttime_mean < cfqd->cfq_slice_idle)
+		return -1;
+normal:
+	/*
+	 * We have other queues, don't allow more IO from this one
+	 */
+	if (cfq_slice_used_soon(cfqd, cfqq))
+		return 0;
+	else
+		/*
+		 * Normally we start throttling cfqq when cfq_quantum/2
+		 * requests have been dispatched. But we can drive
+		 * deeper queue depths at the beginning of slice
+		 * subjected to upper limit of cfq_quantum.
+		 * */
+		return cfqd->cfq_quantum;
+}
+
 static bool cfq_may_dispatch(struct cfq_data *cfqd, struct cfq_queue *cfqq)
 {
 	unsigned int max_dispatch;
@@ -2411,25 +2482,9 @@ static bool cfq_may_dispatch(struct cfq_
 		if (cfq_class_idle(cfqq))
 			return false;
 
-		/*
-		 * We have other queues, don't allow more IO from this one
-		 */
-		if (cfqd->busy_queues > 1 && cfq_slice_used_soon(cfqd, cfqq))
+		max_dispatch = cfq_queue_max_quantum(cfqd, cfqq);
+		if (max_dispatch == 0)
 			return false;
-
-		/*
-		 * Sole queue user, no limit
-		 */
-		if (cfqd->busy_queues == 1)
-			max_dispatch = -1;
-		else
-			/*
-			 * Normally we start throttling cfqq when cfq_quantum/2
-			 * requests have been dispatched. But we can drive
-			 * deeper queue depths at the beginning of slice
-			 * subjected to upper limit of cfq_quantum.
-			 * */
-			max_dispatch = cfqd->cfq_quantum;
 	}
 
 	/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/