linux-kernel - Re: [PATCH] cfq-iosched: non-rot devices do not need read queue merging

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e5e476b1001040836p2c8d7486x807a1a89b61c2458@mail.gmail.com>
Date:	Mon, 4 Jan 2010 17:36:21 +0100
From:	Corrado Zoccolo <czoccolo@...il.com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Jens Axboe <jens.axboe@...cle.com>,
	Linux-Kernel <linux-kernel@...r.kernel.org>,
	Jeff Moyer <jmoyer@...hat.com>,
	Shaohua Li <shaohua.li@...el.com>,
	Gui Jianfeng <guijianfeng@...fujitsu.com>
Subject: Re: [PATCH] cfq-iosched: non-rot devices do not need read queue 
	merging

Hi Vivkek,

On Mon, Jan 4, 2010 at 3:47 PM, Vivek Goyal <vgoyal@...hat.com> wrote:
> On Wed, Dec 30, 2009 at 11:22:47PM +0100, Corrado Zoccolo wrote:
>> Non rotational devices' performances are not affected by
>> distance of read requests, so there is no point in having
>> overhead to merge such queues.
>> This doesn't apply to writes, so this patch changes the
>> queued[] field, to be indexed by READ/WRITE instead of
>> SYNC/ASYNC, and only compute proximity for queues with
>> WRITE requests.
>>
>
> Hi Corrado,
>
> What's the reason that reads don't benefit from merging queues and hence
> merging requests and only writes do on SSD?

On SSDs, reads are just limited by the maximum transfer rate, and
larger (i.e. merged) reads will just take proportionally longer.

>> Signed-off-by: Corrado Zoccolo <czoccolo@...il.com>
>> ---
>>  block/cfq-iosched.c |   20 +++++++++++---------
>>  1 files changed, 11 insertions(+), 9 deletions(-)
>>
>> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
>> index 918c7fd..7da9391 100644
>> --- a/block/cfq-iosched.c
>> +++ b/block/cfq-iosched.c
>> @@ -108,9 +108,9 @@ struct cfq_queue {
>>       struct rb_root sort_list;
>>       /* if fifo isn't expired, next request to serve */
>>       struct request *next_rq;
>> -     /* requests queued in sort_list */
>> +     /* requests queued in sort_list, indexed by READ/WRITE */
>>       int queued[2];
>> -     /* currently allocated requests */
>> +     /* currently allocated requests, indexed by READ/WRITE */
>>       int allocated[2];
>
> Sometime back Jens had changed all READ/WRITE indexing to SYNC/ASYNC
> indexing throughout IO schedulers and block layer.
Not completely. The allocated field (for which I fixed only the
comment) is still addressed as READ/WRITE.
> Personally I would
> prefer to keep it that way and not have a mix of SYNC/ASYNC and READ/WRITE
> indexing in code.
I think that, as long as it is documented, it should be fine.

> What are we gaining by this patch? Save some cpu cycles by not merging
> and splitting the read cfqq on ssd?
Yes. We should save a lot of cycles by saving the rb tree management
to achieve those operations.
Jens' position is that for fast SSDs, we need to save CPU cycles if we
want to perform well.

> Do you have any numbers how much is
> the saving. My knee jerk reaction is that if gains are not significant,
> lets not do this optimization and let the code be simple.
I think we are actually simplifying the code, removing an optimization
(queue merging) when it is not needed.
When you want to reason about how the code performs on SSD, removing
the unknown of queue merging renders the problem easier.
>
>
>>       /* fifo list of requests in sort_list */
>>       struct list_head fifo;
>> @@ -1268,7 +1268,8 @@ static void cfq_prio_tree_add(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>>               return;
>>       if (!cfqq->next_rq)
>>               return;
>> -
>> +     if (blk_queue_nonrot(cfqd->queue) && !cfqq->queued[WRITE])
>> +             return;
>
> A 1-2 line comment here will help about why writes still benefit and not
> reads.
>
It's because low-end SSDs are penalized by small writes. I don't have
an high end SSD to test with, but Jens is going to do more testing,
and eventually he can disable merging also for writes if he sees
improvement. Note that this is not the usual async write, but sync
write with aio, that I think is quite a niche.

>>       cfqq->p_root = &cfqd->prio_trees[cfqq->org_ioprio];
>>       __cfqq = cfq_prio_tree_lookup(cfqd, cfqq->p_root,
>>                                     blk_rq_pos(cfqq->next_rq), &parent, &p);
>> @@ -1337,10 +1338,10 @@ static void cfq_del_cfqq_rr(struct cfq_data *cfqd, struct cfq_queue *cfqq)
>>  static void cfq_del_rq_rb(struct request *rq)
>>  {
>>       struct cfq_queue *cfqq = RQ_CFQQ(rq);
>> -     const int sync = rq_is_sync(rq);
>> +     const int rw = rq_data_dir(rq);
>>
>> -     BUG_ON(!cfqq->queued[sync]);
>> -     cfqq->queued[sync]--;
>> +     BUG_ON(!cfqq->queued[rw]);
>> +     cfqq->queued[rw]--;
>>
>>       elv_rb_del(&cfqq->sort_list, rq);
>>
>> @@ -1363,7 +1364,7 @@ static void cfq_add_rq_rb(struct request *rq)
>>       struct cfq_data *cfqd = cfqq->cfqd;
>>       struct request *__alias, *prev;
>>
>> -     cfqq->queued[rq_is_sync(rq)]++;
>> +     cfqq->queued[rq_data_dir(rq)]++;
>>
>>       /*
>>        * looks a little odd, but the first insert might return an alias.
>> @@ -1393,7 +1394,7 @@ static void cfq_add_rq_rb(struct request *rq)
>>  static void cfq_reposition_rq_rb(struct cfq_queue *cfqq, struct request *rq)
>>  {
>>       elv_rb_del(&cfqq->sort_list, rq);
>> -     cfqq->queued[rq_is_sync(rq)]--;
>> +     cfqq->queued[rq_data_dir(rq)]--;
>>       cfq_add_rq_rb(rq);
>>  }
>>
>> @@ -1689,7 +1690,8 @@ static struct cfq_queue *cfqq_close(struct cfq_data *cfqd,
>>       struct cfq_queue *__cfqq;
>>       sector_t sector = cfqd->last_position;
>>
>> -     if (RB_EMPTY_ROOT(root))
>> +     if (RB_EMPTY_ROOT(root) ||
>> +         (blk_queue_nonrot(cfqd->queue) && !cur_cfqq->queued[WRITE]))
>>               return NULL;
>>
>>       /*
>> --
>> 1.6.4.4
>
Thanks
Corrado
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/