linux-kernel - Re: performance "regression" in cfq compared to anticipatory, deadline and noop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 14 May 2008 23:37:19 +0200
From:	Matthew <jackdachef@...il.com>
To:	"Daniel J Blueman" <daniel.blueman@...il.com>
Cc:	"Jens Axboe" <jens.axboe@...cle.com>,
	"Kasper Sandberg" <lkml@...anurb.dk>,
	"Linux Kernel" <linux-kernel@...r.kernel.org>
Subject: Re: performance "regression" in cfq compared to anticipatory, deadline and noop

On Wed, May 14, 2008 at 10:52 PM, Daniel J Blueman
<daniel.blueman@...il.com> wrote:
> On Wed, May 14, 2008 at 9:26 AM, Jens Axboe <jens.axboe@...cle.com> wrote:
>> On Wed, May 14 2008, Daniel J Blueman wrote:
>>> >  > > >  > >  > On Sun, 2008-05-11 at 14:14 +0100, Daniel J Blueman wrote:
>>> >  > > >  > >  > > I've been experiencing this for a while also; an almost 50% regression
>>> >  > > >  > >  > > is seen for single-process reads (ie sync) if slice_idle is 1ms or
>>> >  > > >  > >  > > more (eg default of 8) [1], which seems phenomenal.
>>> >  > > >  > >  > >
>>> >  > > >  > >  > > Jens, is this the expected price to pay for optimal busy-spindle
>>> >  > > >  > >  > > scheduling, a design issue, bug or am I missing something totally?
>>> [snip]
>>> >  > They seem to start out the same, but then CFQ gets interrupted by a
>>> >  > timer unplug (which is also odd) and after that the request size drops.
>>> >  > On most devices you don't notice, but some are fairly picky about
>>> >  > request sizes. The end result is that CFQ has an average dispatch
>>> >  > request size of 142kb, where AS is more than double that at 306kb. I'll
>>> >  > need to analyze the data and look at the code a bit more to see WHY this
>>> >  > happens.
>>> >
>>> >  Here's a test patch, I think we get into this situation due to CFQ being
>>> >  a bit too eager to start queuing again. Not tested, I'll need to spend
>>> >  some testing time on this. But I'd appreciate some feedback on whether
>>> >  this changes the situation! The final patch will be a little more
>>> >  involved.
>>> >
>>> >  diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
>>> >  index b399c62..ebd8ce2 100644
>>> >  --- a/block/cfq-iosched.c
>>> >  +++ b/block/cfq-iosched.c
>>> >  @@ -1775,18 +1775,8 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>>> >
>>> >         cic->last_request_pos = rq->sector + rq->nr_sectors;
>>> >
>>> >  -       if (cfqq == cfqd->active_queue) {
>>> >  -               /*
>>> >  -                * if we are waiting for a request for this queue, let it rip
>>> >  -                * immediately and flag that we must not expire this queue
>>> >  -                * just now
>>> >  -                */
>>> >  -               if (cfq_cfqq_wait_request(cfqq)) {
>>> >  -                       cfq_mark_cfqq_must_dispatch(cfqq);
>>> >  -                       del_timer(&cfqd->idle_slice_timer);
>>> >  -                       blk_start_queueing(cfqd->queue);
>>> >  -               }
>>> >  -       } else if (cfq_should_preempt(cfqd, cfqq, rq)) {
>>> >  +       if ((cfqq != cfqd->active_queue) &&
>>> >  +                  cfq_should_preempt(cfqd, cfqq, rq)) {
>>> >                 /*
>>> >                  * not the active queue - expire current slice if it is
>>> >                  * idle and has expired it's mean thinktime or this new queue
>>>
>>> I find this does address the issue (both with 64KB stride dd and
>>> hdparm -t; presumably the requests getting merged). Tested on
>>> 2.6.26-rc2 on Ubuntu HH 804 x86-64, with slice_idle defaulting to 8
>>> and AHCI on ICH9; disk is ST3320613AS.
>>>
>>> Blktrace profiles from 'dd if=/dev/sda of=/dev/null bs=64k count=1000' are at:
>>>
>>> http://quora.org/blktrace-profiles.tar.bz2
>>
>> Goodie! I think the below patch is better - we do want to schedule the
>> queue immediately, but we do not want to interrupt the queuer. So just
>> kick the workqueue handling of the queue instead of entering the
>> dispatcher directly. Can you test this one as well? Thanks!
>>
>> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
>> index f4e1006..e8c1941 100644
>> --- a/block/cfq-iosched.c
>> +++ b/block/cfq-iosched.c
>> @@ -1107,7 +1107,6 @@ static int cfq_dispatch_requests(struct request_queue *q, int force)
>>
>>                cfq_clear_cfqq_must_dispatch(cfqq);
>>                cfq_clear_cfqq_wait_request(cfqq);
>> -               del_timer(&cfqd->idle_slice_timer);
>>
>>                dispatched += __cfq_dispatch_requests(cfqd, cfqq, max_dispatch);
>>        }
>> @@ -1769,15 +1768,9 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>>        cic->last_request_pos = rq->sector + rq->nr_sectors;
>>
>>        if (cfqq == cfqd->active_queue) {
>> -               /*
>> -                * if we are waiting for a request for this queue, let it rip
>> -                * immediately and flag that we must not expire this queue
>> -                * just now
>> -                */
>>                if (cfq_cfqq_wait_request(cfqq)) {
>> -                       cfq_mark_cfqq_must_dispatch(cfqq);
>>                        del_timer(&cfqd->idle_slice_timer);
>> -                       blk_start_queueing(cfqd->queue);
>> +                       kblockd_schedule_work(&cfqd->unplug_work);
>>                }
>>        } else if (cfq_should_preempt(cfqd, cfqq, rq)) {
>>                /*
>> @@ -1787,7 +1780,7 @@ cfq_rq_enqueued(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>>                 */
>>                cfq_preempt_queue(cfqd, cfqq);
>>                cfq_mark_cfqq_must_dispatch(cfqq);
>> -               blk_start_queueing(cfqd->queue);
>> +               kblockd_schedule_work(&cfqd->unplug_work);
>>        }
>>  }
>
> Applied on top of 2.6.26-rc2, I get platter-speed (118MB/s) with 'dd
> if=/dev/sda of=/dev/null bs=64k' and 'hdparm -t', so looks good.
> Identical testing without the patch (ie pure mainline) consistently
> yields 65MB/s.
>
> Blktrace profile at:
>
> http://quora.org/blktrace-profiles-2.tar.bz2
>
> I'll check for performance regressions with postmark on XFS; anything
> else worth running while I've got this in hand?
>
> Daniel
> --
> Daniel J Blueman
>

so it seems something specific to >=2.6.26-rc2 + the patch fixed it for you ?

were there any notable changes from 2.6.25 -> 2.6.26 in cfq or the VFS
in general ?

I'm curious if this also works with 2.6.25, could you please test that too ?

I'll give .26-rc2 a test-ride later

thanks

Mat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/