linux-kernel - Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in blk_mq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7e318bdf-a4dd-16dc-3963-a3f1a6fc426a@kernel.dk>
Date:   Fri, 13 Oct 2017 10:20:01 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     Ming Lei <ming.lei@...hat.com>
Cc:     linux-block@...r.kernel.org, Christoph Hellwig <hch@...radead.org>,
        Bart Van Assche <bart.vanassche@...disk.com>,
        Laurence Oberman <loberman@...hat.com>,
        Paolo Valente <paolo.valente@...aro.org>,
        Oleksandr Natalenko <oleksandr@...alenko.name>,
        Tom Nguyen <tom81094@...il.com>, linux-kernel@...r.kernel.org,
        linux-scsi@...r.kernel.org, Omar Sandoval <osandov@...com>,
        John Garry <john.garry@...wei.com>
Subject: Re: [PATCH V7 4/6] blk-mq: introduce .get_budget and .put_budget in
 blk_mq_ops

On 10/13/2017 10:17 AM, Ming Lei wrote:
> On Fri, Oct 13, 2017 at 08:44:23AM -0600, Jens Axboe wrote:
>> On 10/12/2017 06:19 PM, Ming Lei wrote:
>>> On Thu, Oct 12, 2017 at 12:46:24PM -0600, Jens Axboe wrote:
>>>> On 10/12/2017 12:37 PM, Ming Lei wrote:
>>>>> For SCSI devices, there is often per-request-queue depth, which need
>>>>> to be respected before queuing one request.
>>>>>
>>>>> The current blk-mq always dequeues one request first, then calls .queue_rq()
>>>>> to dispatch the request to lld. One obvious issue of this way is that I/O
>>>>> merge may not be good, because when the per-request-queue depth can't be
>>>>> respected,  .queue_rq() has to return BLK_STS_RESOURCE, then this request
>>>>> has to staty in hctx->dispatch list, and never got chance to participate
>>>>> into I/O merge.
>>>>>
>>>>> This patch introduces .get_budget and .put_budget callback in blk_mq_ops,
>>>>> then we can try to get reserved budget first before dequeuing request.
>>>>> Once we can't get budget for queueing I/O, we don't need to dequeue request
>>>>> at all, then I/O merge can get improved a lot.
>>>>
>>>> I can't help but think that it would be cleaner to just be able to
>>>> reinsert the request into the scheduler properly, if we fail to
>>>> dispatch it. Bart hinted at that earlier as well.
>>>
>>> Actually when I start to investigate the issue, the 1st thing I tried
>>> is to reinsert, but that way is even worse on qla2xxx.
>>>
>>> Once request is dequeued, the IO merge chance is decreased a lot.
>>> With none scheduler, it becomes not possible to merge because
>>> we only try to merge over the last 8 requests. With mq-deadline,
>>> when one request is reinserted, another request may be dequeued
>>> at the same time.
>>
>> I don't care too much about 'none'. If perfect merging is crucial for
>> getting to the performance level you want on the hardware you are using,
>> you should not be using 'none'. 'none' will work perfectly fine for NVMe
>> etc style devices, where we are not dependent on merging to the same
>> extent that we are on other devices.
> 
> We still have some SCSI device, such as qla2xxx, which is 1:1 multi-queue
> device, like NVMe, in my test, the big lock of mq-deadline has been
> an issue for this kind of device, and none actually is better than
> mq-deadline, even though its merge isn't good.

Kyber should be able to fill that hole, hopefully.

-- 
Jens Axboe