linux-kernel - Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <adbe6097-9ece-af0a-a5c6-a4299c9bb72a@kernel.dk>
Date:   Fri, 28 Oct 2016 08:10:06 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     Jan Kara <jack@...e.cz>
Cc:     Paolo Valente <paolo.valente@...aro.org>,
        Christoph Hellwig <hch@...radead.org>,
        Arnd Bergmann <arnd@...db.de>,
        Bart Van Assche <bart.vanassche@...disk.com>,
        Tejun Heo <tj@...nel.org>, linux-block@...r.kernel.org,
        Linux-Kernal <linux-kernel@...r.kernel.org>,
        Ulf Hansson <ulf.hansson@...aro.org>,
        Linus Walleij <linus.walleij@...aro.org>,
        Mark Brown <broonie@...nel.org>,
        Hannes Reinecke <hare@...e.de>, grant.likely@...retlab.ca,
        James.Bottomley@...senpartnership.com
Subject: Re: [PATCH 00/14] introduce the BFQ-v0 I/O scheduler as an extra
 scheduler

On 10/28/2016 01:59 AM, Jan Kara wrote:
> On Thu 27-10-16 10:26:18, Jens Axboe wrote:
>> On 10/27/2016 03:26 AM, Jan Kara wrote:
>>> On Wed 26-10-16 10:12:38, Jens Axboe wrote:
>>>> On 10/26/2016 10:04 AM, Paolo Valente wrote:
>>>>>
>>>>>> Il giorno 26 ott 2016, alle ore 17:32, Jens Axboe <axboe@...nel.dk> ha scritto:
>>>>>>
>>>>>> On 10/26/2016 09:29 AM, Christoph Hellwig wrote:
>>>>>>> On Wed, Oct 26, 2016 at 05:13:07PM +0200, Arnd Bergmann wrote:
>>>>>>>> The question to ask first is whether to actually have pluggable
>>>>>>>> schedulers on blk-mq at all, or just have one that is meant to
>>>>>>>> do the right thing in every case (and possibly can be bypassed
>>>>>>>> completely).
>>>>>>>
>>>>>>> That would be my preference.  Have a BFQ-variant for blk-mq as an
>>>>>>> option (default to off unless opted in by the driver or user), and
>>>>>>> not other scheduler for blk-mq.  Don't bother with bfq for non
>>>>>>> blk-mq.  It's not like there is any advantage in the legacy-request
>>>>>>> device even for slow devices, except for the option of having I/O
>>>>>>> scheduling.
>>>>>>
>>>>>> It's the only right way forward. blk-mq might not offer any substantial
>>>>>> advantages to rotating storage, but with scheduling, it won't offer a
>>>>>> downside either. And it'll take us towards the real goal, which is to
>>>>>> have just one IO path.
>>>>>
>>>>> ok
>>>>>
>>>>>> Adding a new scheduler for the legacy IO path
>>>>>> makes no sense.
>>>>>
>>>>> I would fully agree if effective and stable I/O scheduling would be
>>>>> available in blk-mq in one or two months.  But I guess that it will
>>>>> take at least one year optimistically, given the current status of the
>>>>> needed infrastructure, and given the great difficulties of doing
>>>>> effective scheduling at the high parallelism and extreme target speeds
>>>>> of blk-mq.  Of course, this holds true unless little clever scheduling
>>>>> is performed.
>>>>>
>>>>> So, what's the point in forcing a lot of users wait another year or
>>>>> more, for a solution that has yet to be even defined, while they could
>>>>> enjoy a much better system, and then switch an even better system when
>>>>> scheduling is ready in blk-mq too?
>>>>
>>>> That same argument could have been made 2 years ago. Saying no to a new
>>>> scheduler for the legacy framework goes back roughly that long. We could
>>>> have had BFQ for mq NOW, if we didn't keep coming back to this very
>>>> point.
>>>>
>>>> I'm hesistant to add a new scheduler because it's very easy to add, very
>>>> difficult to get rid of. If we do add BFQ as a legacy scheduler now,
>>>> it'll take us years and years to get rid of it again. We should be
>>>> moving towards LESS moving parts in the legacy path, not more.
>>>>
>>>> We can keep having this discussion every few years, but I think we'd
>>>> both prefer to make some actual progress here. It's perfectly fine to
>>>> add an interface for a single queue interface for an IO scheduler for
>>>> blk-mq, since we don't care too much about scalability there. And that
>>>> won't take years, that should be a few weeks. Retrofitting BFQ on top of
>>>> that should not be hard either. That can co-exist with a real multiqueue
>>>> scheduler as well, something that's geared towards some fairness for
>>>> faster devices.
>>>
>>> OK, so some solution like having a variant of blk_sq_make_request() that
>>> will consume requests, do IO scheduling decisions on them, and feed them
>>> into the HW queue is it sees fit would be acceptable? That will provide the
>>> IO scheduler a global view that it needs for complex scheduling decisions
>>> so it should indeed be relatively easy to port BFQ to work like that.
>>
>> I'd probably start off Omar's base [1] that switches the software queues
>> to store bios instead of requests, since that lifts the of the 1:1
>> mapping between what we can queue up and what we can dispatch. Without
>> that, the IO scheduler won't have too much to work with. And with that
>> in place, it'll be a "bio in, request out" type of setup, which is
>> similar to what we have in the legacy path.
>>
>> I'd keep the software queues, but as a starting point, mandate 1
>> hardware queue to keep that as the per-device view of the state. The IO
>> scheduler would be responsible for moving one or more bios from the
>> software queues to the hardware queue, when they are ready to dispatch.
>>
>> [1] https://github.com/osandov/linux/commit/8ef3508628b6cf7c4712cd3d8084ee11ef5d2530
>
> Yeah, but what would be software queues actually good for for a single
> queue device with device-global IO scheduling? The IO scheduler doing
> complex decisions will keep all the bios / requests in a single structure
> anyway so there's no scalability to gain from per-cpu software queues...
> So you can directly consume bios in your ->make_request handler, place it
> in IO scheduler structures and then push requests out to the HW queue in
> response to HW tags getting freed (i.e. IO completion). No need
> for intermediate software queues. But maybe I miss something.

The software queues tend to take the pressure of lock contention on the
submission side. It's one of the reasons why the single queue blk-mq
still scales a lot better than the old request_fn model.

If you bypass and grab them at make_request time, I'd be worried that we
are now losing the various support functionality we have for block
devices, or having to implement that differently.

-- 
Jens Axboe