linux-kernel - Re: tiobench read 50% regression with 2.6.30-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <x49fxgaimq8.fsf@segfault.boston.devel.redhat.com>
Date:	Wed, 15 Apr 2009 00:07:11 -0400
From:	Jeff Moyer <jmoyer@...hat.com>
To:	Jens Axboe <jens.axboe@...cle.com>
Cc:	"Zhang\, Yanmin" <yanmin_zhang@...ux.intel.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: tiobench read 50% regression with 2.6.30-rc1

Jens Axboe <jens.axboe@...cle.com> writes:

> On Fri, Apr 10 2009, Zhang, Yanmin wrote:
>> On Thu, 2009-04-09 at 11:57 +0200, Jens Axboe wrote:
>> > On Thu, Apr 09 2009, Zhang, Yanmin wrote:
>> > > Comparing with 2.6.29's result, tiobench (read) has about 50% regression
>> > > with 2.6.30-rc1 on all my machines. Bisect down to below patch.
>> > > 
>> > > b029195dda0129b427c6e579a3bb3ae752da3a93 is first bad commit
>> > > commit b029195dda0129b427c6e579a3bb3ae752da3a93
>> > > Author: Jens Axboe <jens.axboe@...cle.com>
>> > > Date:   Tue Apr 7 11:38:31 2009 +0200
>> > > 
>> > >     cfq-iosched: don't let idling interfere with plugging
>> > >     
>> > >     When CFQ is waiting for a new request from a process, currently it'll
>> > >     immediately restart queuing when it sees such a request. This doesn't
>> > >     work very well with streamed IO, since we then end up splitting IO
>> > >     that would otherwise have been merged nicely. For a simple dd test,
>> > >     this causes 10x as many requests to be issued as we should have.
>> > >     Normally this goes unnoticed due to the low overhead of requests
>> > >     at the device side, but some hardware is very sensitive to request
>> > >     sizes and there it can cause big slow downs.
>> > > 
>> > > 
>> > > 
>> > > Command to start the testing:
>> > > #tiotest -k0 -k1 -k3 -f 80 -t 32
>> > > 
>> > > It's a multi-threaded program and starts 32 threads. Every thread does I/O
>> > > on its own 80MB file.
>> The files should be created before the testing and pls. drop page caches
>> by "echo 3 >/proc/sys/vm/drop_caches" before testing.
>> 
>> > 
>> > It's not a huge surprise that we regressed there. I'll get this fixed up
>> > next week. Can you I talk you into trying to change the 'quantum' sysfs
>> > variable for the drive? It's in /sys/block/xxx/queue/iosched where xxx
>> > is your drive(s). It's set to 4, if you could try progressively larger
>> > settings and retest, that would help get things started.
>> I tried 4,8,16,64,128 and didn't find result difference.
>
> Can you try with this patch?
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index a4809de..66f00e5 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -1905,10 +1905,17 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>  		 * Remember that we saw a request from this process, but
>  		 * don't start queuing just yet. Otherwise we risk seeing lots
>  		 * of tiny requests, because we disrupt the normal plugging
> -		 * and merging.
> +		 * and merging. If the request is already larger than a single
> +		 * page, let it rip immediately. For that case we assume that
> +		 * merging is already done.
>  		 */
> -		if (cfq_cfqq_wait_request(cfqq))
> +		if (cfq_cfqq_wait_request(cfqq)) {
> +			if (blk_rq_bytes(rq) > PAGE_CACHE_SIZE) {
> +				del_timer(&cfqd->idle_slice_timer);
> +				blk_start_queueing(cfqd->queue);
> +			}
>  			cfq_mark_cfqq_must_dispatch(cfqq);
> +		}
>  	} else if (cfq_should_preempt(cfqd, cfqq, rq)) {
>  		/*
>  		 * not the active queue - expire current slice if it is

I tested this using iozone to read a file from an NFS client.  The
iozone command line was:
  iozone -s 2000000 -r 64 -f /mnt/test/testfile -i 1 -w

The numbers in the nfsd's row represent the number of nfsd threads.  I
included numbers for the deadline scheduler as well for comparison.

               v2.6.29

nfsd's  |   1    |  2   |   4   |   8
--------+---------------+-------+------
cfq     | 91356 | 66391 | 61942 | 51674
deadline| 43207 | 67436 | 96289 | 107784

              2.6.30-rc1

nfsd's  |   1   |   2   |   4   |   8
--------+---------------+-------+------
cfq     | 43127 | 22354 | 20858 | 21179
deadline| 43732 | 68059 | 76659 | 83231

          2.6.30-rc1 + cfq fix

nfsd's  |   1    |    2   |   4   |   8
--------+-----------------+-------+------
cfq     | 114602 | 102280 | 43479 | 43160

As you can see, for 1 and 2 threads, the patch *really* helps out.  We
still don't get back the performance for 4 and 8 nfsd threads, though.
It's interesting to note that the deadline scheduler regresses for 4 and
8 threads, as well.  I think we've still got some digging to do.

I'll try the cfq close cooperator patches next.

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/