linux-kernel - Re: [RFC PATCH] block: Change default IO scheduler to deadline except SATA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4F847EC4.7040604@kernel.dk>
Date:	Tue, 10 Apr 2012 20:41:08 +0200
From:	Jens Axboe <axboe@...nel.dk>
To:	Vivek Goyal <vgoyal@...hat.com>
CC:	linux kernel mailing list <linux-kernel@...r.kernel.org>,
	Moyer Jeff Moyer <jmoyer@...hat.com>
Subject: Re: [RFC PATCH] block: Change default IO scheduler to deadline except
 SATA

On 2012-04-10 17:10, Vivek Goyal wrote:
> On Tue, Apr 10, 2012 at 10:21:48AM -0400, Vivek Goyal wrote:
>> On Tue, Apr 10, 2012 at 03:56:39PM +0200, Jens Axboe wrote:
>>> On 2012-04-10 15:37, Vivek Goyal wrote:
>>>> Hi,
>>>>
>>>> I am wondering if CFQ as default scheduler is still the right choice. CFQ
>>>> generally works well on slow rotational media (SATA?). But often
>>>> underperforms on faster storage (storage arrays, PCIE SSDs, virtualized
>>>> disk in linux guests etc). People often put logic in user space to tune their
>>>> systems and change IO scheduler to deadline to get better performance on
>>>> faster storage.
>>>>
>>>> Though there is not one good answer for all kind of storage and for all
>>>> kind of workloads, I am wondering if we can provide a better default and
>>>> that is change default IO scheduler to "deadline" except SATA.
>>>>
>>>> One can argue that some SAS disks can be slow too and benefit from CFQ. Yes,
>>>> but default IO scheduler choice is not perfect anyway. It just tries to
>>>> cater to a wide variety of use cases out of the box.
>>>>
>>>> So I am throwing this patch out see if it flies. Personally, I think it
>>>> might turn out to be a more reasonable default.
>>>
>>> I think it'd be a lot more sane to just use CFQ on rotational single
>>> devices, and default to deadline on raid or non-rotational devices. This
>>> still isn't perfect, since less worthy SSDs still benefit from the
>>> read/write separation, and some multi device configs will be faster as
>>> well. But it's better.
>>
>> Hi Jens,
>>
>> Thanks. Taking a decision based on rotational flag makes sense. I am
>> not sure that does one get the information that a block device is a single
>> device or not. Especially with HBAs, SCSI Luns over Fiber, iSCSI Luns etc.
>> I have few Scsi Luns exported to me backed by a storage array. Everything
>> runs CFQ by default. And though disks in the array are rotational, they
>> are RAIDed and AFAIK, this information is not available to driver.
>>
>> I am not sure if there is an easy way to get similar info for dm/md devices.
> 
> Thinking more about it, even if we have a way to define a request queue
> flag for multi devices (QUEUE_FLAG_MULTI_DEVICE), when can block layer
> take a decision to change the IO scheduler. At queue alloc and init time
> driver might not have even called add_disk() or set all the
> flags/properties of the queue. So doing it at queue alloc/init time might
> not be best.
> 
> And later we get control only when actual IO happens on the queue and
> doing one more check or trying to change elevator in IO path is not a
> good idea.
> 
> May be when driver tries to set ROTATIONAL or MULTI_DEVICE flag, we can
> check and change elevator then.
> 
> So we are back to the question of can scsi devices find out if a Lun
> is backed by single disk or multiple disks.

The cleanest would be to have the driver signal these attributes at
probe time. You could even adjust CFQ properties based on this, driving
the queue depth harder etc. Realistically, going forward, most fast
flash devices will be driven by a noop-like scheduler on multiqueue. So
CPU cost of the IO scheduler can mostly be ignored, since CFQ cost on
even big RAIDs isn't an issue due to the low IOPS rates.

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/