lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 13 Nov 2020 14:26:50 -0700
From:   Jens Axboe <axboe@...nel.dk>
To:     Sagi Grimberg <sagi@...mberg.me>,
        Rachit Agarwal <rach4x0r@...il.com>,
        Christoph Hellwig <hch@....de>
Cc:     linux-block@...r.kernel.org, linux-nvme@...ts.infradead.org,
        linux-kernel@...r.kernel.org, Keith Busch <kbusch@...nel.org>,
        Ming Lei <ming.lei@...hat.com>,
        Jaehyun Hwang <jaehyun.hwang@...nell.edu>,
        Qizhe Cai <qc228@...nell.edu>,
        Midhul Vuppalapati <mvv25@...nell.edu>,
        Rachit Agarwal <ragarwal@...cornell.edu>,
        Sagi Grimberg <sagi@...htbitslabs.com>,
        Rachit Agarwal <ragarwal@...nell.edu>
Subject: Re: [PATCH] iosched: Add i10 I/O Scheduler

On 11/13/20 2:23 PM, Sagi Grimberg wrote:
> 
>>>> I haven't taken a close look at the code yet so far, but one quick note
>>>> that patches like this should be against the branches for 5.11. In fact,
>>>> this one doesn't even compile against current -git, as
>>>> blk_mq_bio_list_merge is now called blk_bio_list_merge.
>>>
>>> Ugh, I guess that Jaehyun had this patch bottled up and didn't rebase
>>> before submitting.. Sorry about that.
>>>
>>>> In any case, I did run this through some quick peak testing as I was
>>>> curious, and I'm seeing about 20% drop in peak IOPS over none running
>>>> this. Perf diff:
>>>>
>>>>       10.71%     -2.44%  [kernel.vmlinux]  [k] read_tsc
>>>>        2.33%     -1.99%  [kernel.vmlinux]  [k] _raw_spin_lock
>>>
>>> You ran this with nvme? or null_blk? I guess neither would benefit
>>> from this because if the underlying device will not benefit from
>>> batching (at least enough for the extra cost of accounting for it) it
>>> will be counter productive to use this scheduler.
>>
>> This is nvme, actual device. The initial posting could be a bit more
>> explicit on the use case, it says:
>>
>> "For NVMe SSDs, the i10 I/O scheduler achieves ~60% improvements in
>> terms of IOPS per core over "noop" I/O scheduler."
>>
>> which made me very skeptical, as it sounds like it's raw device claims.
> 
> You are absolutely right, that needs to be fixed.
> 
>> Does beg the question of why this is a new scheduler then. It's pretty
>> basic stuff, something that could trivially just be added a side effect
>> of the core (and in fact we have much of it already). Doesn't really seem
>> to warrant a new scheduler at all. There isn't really much in there.
> 
> Not saying it absolutely warrants a new one, and it could I guess sit in
> the core, but this attempts to optimize for a specific metric while
> trading-off others, which is exactly what I/O schedulers are for,
> optimizing for a specific metric.
> 
> Not sure we want to build something biases towards throughput on the
> expense of latency into the block core. And, as mentioned this is not
> well suited to all device types...
> 
> But if you think this has a better home, I'm assuming that the guys
> will be open to that.

Also see the reply from Ming. It's a balancing act - don't want to add
extra overhead to the core, but also don't want to carry an extra
scheduler if the main change is really just variable dispatch batching.
And since we already have a notion of that, seems worthwhile to explore
that venue.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ