linux-kernel - Re: CFQ read performance regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100428200222.GL16033@redhat.com>
Date:	Wed, 28 Apr 2010 16:02:22 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Corrado Zoccolo <czoccolo@...il.com>
Cc:	Miklos Szeredi <mszeredi@...e.cz>,
	Jens Axboe <jens.axboe@...cle.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Jan Kara <jack@...e.cz>, Suresh Jayaraman <sjayaraman@...e.de>
Subject: Re: CFQ read performance regression

On Tue, Apr 27, 2010 at 07:25:14PM +0200, Corrado Zoccolo wrote:
> On Mon, Apr 26, 2010 at 9:14 PM, Vivek Goyal <vgoyal@...hat.com> wrote:
> > On Sat, Apr 24, 2010 at 10:36:48PM +0200, Corrado Zoccolo wrote:
> >
> > [..]
> >> >> Anyway, if that's the case, then we probably need to allow IO from
> >> >> multiple sequential readers and keep a watch on throughput. If throughput
> >> >> drops then reduce the number of parallel sequential readers. Not sure how
> >> >> much of code that is but with multiple cfqq going in parallel, ioprio
> >> >> logic will more or less stop working in CFQ (on multi-spindle hardware).
> >> Hi Vivek,
> >> I tried to implement exactly what you are proposing, see the attached patches.
> >> I leverage the queue merging features to let multiple cfqqs share the
> >> disk in the same timeslice.
> >> I changed the queue split code to trigger on throughput drop instead
> >> of on seeky pattern, so diverging queues can remain merged if they
> >> have good throughput. Moreover, I measure the max bandwidth reached by
> >> single queues and merged queues (you can see the values in the
> >> bandwidth sysfs file).
> >> If merged queues can outperform non-merged ones, the queue merging
> >> code will try to opportunistically merge together queues that cannot
> >> submit enough requests to fill half of the NCQ slots. I'd like to know
> >> if you can see any improvements out of this on your hardware. There
> >> are some magic numbers in the code, you may want to try tuning them.
> >> Note that, since the opportunistic queue merging will start happening
> >> only after merged queues have shown to reach higher bandwidth than
> >> non-merged queues, you should use the disk for a while before trying
> >> the test (and you can check sysfs), or the merging will not happen.
> >
> > Hi Corrado,
> >
> > I ran these patches and I did not see any improvement. I think the reason
> > being that no cooperative queue merging took place and we did not have
> > any data for throughput with coop flag on.
> >
> > #cat /sys/block/dm-3/queue/iosched/bandwidth
> > 230     753     0
> >
> > I think we need to implement something similiar to hw_tag detection logic
> > where we allow dispatches from multiple sync-idle queues at a time and try
> > to observe the BW. After certain window once we have observed the window,
> > then set the system behavior accordingly.
> Hi Vivek,
> thanks for testing. Can you try changing the condition to enable the
> queue merging to also consider the case in which max_bw[1] == 0 &&
> max_bw[0] > 100MB/s (notice that max_bw is measured in
> sectors/jiffie).
> This should rule out low end disks, and enable merging where it can be
> beneficial.
> If the results are good, but we find this enabling condition
> unreliable, then we can think of a better way, but I'm curious to see
> if the results are promising before thinking to the details.

Ok, I made some changes and now some queue merging seems to be happening
and I am getting little better BW. This will require more debugging. I
will try to spare some time later.

Kernel=2.6.34-rc5-corrado-multicfq
DIR= /mnt/iostmnt/fio          DEV= /dev/mapper/mpathe       
Workload=bsr       iosched=cfq      Filesz=1G    bs=16K  
==========================================================================
job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)    
---       --- --  ------------   -----------    -------------  -----------    
bsr       1   1   136457         67353          0              0              
bsr       1   2   148008         192415         0              0              
bsr       1   4   180223         535205         0              0              
bsr       1   8   166983         542326         0              0              
bsr       1   16  176617         832188         0              0              

Output of iosched/bandwidth

0	546	740

I did following changes on top of your patch.

Vivek

---
 block/cfq-iosched.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 4e9e015..7589c44 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -243,6 +243,7 @@ struct cfq_data {
 	 */
 	int hw_tag_est_depth;
 	unsigned int hw_tag_samples;
+	unsigned int cfqq_merged_samples;
 	/*
 	 * performance measurements
 	 * max_bw is indexed by coop flag.
@@ -1736,10 +1737,14 @@ static struct cfq_queue *cfqq_close(struct cfq_data *cfqd,
 	// Opportunistic queue merging could be beneficial even on far queues
 	// We enable it only on NCQ disks, if we observed that merged queues
 	// can reach higher bandwidth than single queues.
+	// 204 sectors per jiffy is equivalent to 100MB/s on 1000 HZ conf.
+	// Allow merge if we don't have sufficient merged cfqq samples.
 	rs = cur_cfqq->allocated[READ] + cur_cfqq->allocated[WRITE];
-	if (cfqd->hw_tag && cfqd->max_bw[1] > cfqd->max_bw[0] &&
+	if (cfqd->hw_tag
+	   && (cfqd->max_bw[1] > cfqd->max_bw[0]
+	       || (cfqd->max_bw[0] > 204 && !sample_valid(cfqd->cfqq_merged_samples)))
 	    // Do not overload the device queue
-	    rs < cfqd->hw_tag_est_depth / 2) {
+	    && rs < cfqd->hw_tag_est_depth / 2) {
 		unsigned r1 = __cfqq->allocated[READ] + __cfqq->allocated[WRITE];
 		unsigned r2 = __cfqq1->allocated[READ] + __cfqq1->allocated[WRITE];
 		// Prefer merging with a queue that has fewer pending requests
@@ -1750,6 +1755,8 @@ static struct cfq_queue *cfqq_close(struct cfq_data *cfqd,
 		// Do not merge if the merged queue would have too many requests
 		if (r1 + rs > cfqd->hw_tag_est_depth / 2)
 			return NULL;
+		cfqd->cfqq_merged_samples++;
+
 		// Merge only if the BW of the two queues is comparable
 		if (abs(__cfqq->last_bw - cur_cfqq->last_bw) * 4 < cfqd->max_bw[0])
 			return __cfqq;
-- 
1.6.2.5

> 
> Thanks,
> Corrado
> 
> >
> > Kernel=2.6.34-rc5-corrado-multicfq
> > DIR= /mnt/iostmnt/fio          DEV= /dev/mapper/mpathe
> > Workload=bsr       iosched=cfq      Filesz=2G    bs=4K
> > ==========================================================================
> > job       Set NR  ReadBW(KB/s)   MaxClat(us)    WriteBW(KB/s)  MaxClat(us)
> > ---       --- --  ------------   -----------    -------------  -----------
> > bsr       1   1   126590         61448          0              0
> > bsr       1   2   127849         242843         0              0
> > bsr       1   4   131886         508021         0              0
> > bsr       1   8   131890         398241         0              0
> > bsr       1   16  129167         454244         0              0
> >
> > Thanks
> > Vivek
> >
> 
> 
> 
> -- 
> __________________________________________________________________________
> 
> dott. Corrado Zoccolo                          mailto:czoccolo@...il.com
> PhD - Department of Computer Science - University of Pisa, Italy
> --------------------------------------------------------------------------
> The self-confidence of a warrior is not the self-confidence of the average
> man. The average man seeks certainty in the eyes of the onlooker and calls
> that self-confidence. The warrior seeks impeccability in his own eyes and
> calls that humbleness.
>                                Tales of Power - C. Castaneda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/