[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <44d5bcb0-689e-50c8-fa8e-a7d2b569f75c@grimberg.me>
Date: Fri, 13 Nov 2020 12:58:10 -0800
From: Sagi Grimberg <sagi@...mberg.me>
To: Ming Lei <ming.lei@...hat.com>, Rachit Agarwal <rach4x0r@...il.com>
Cc: Jens Axboe <axboe@...nel.dk>, Qizhe Cai <qc228@...nell.edu>,
Rachit Agarwal <ragarwal@...nell.edu>,
linux-kernel@...r.kernel.org, linux-nvme@...ts.infradead.org,
linux-block@...r.kernel.org,
Midhul Vuppalapati <mvv25@...nell.edu>,
Jaehyun Hwang <jaehyun.hwang@...nell.edu>,
Rachit Agarwal <ragarwal@...cornell.edu>,
Keith Busch <kbusch@...nel.org>,
Sagi Grimberg <sagi@...htbitslabs.com>,
Christoph Hellwig <hch@....de>
Subject: Re: [PATCH] iosched: Add i10 I/O Scheduler
> blk-mq actually has built-in batching(or sort of) mechanism, which is enabled
> if the hw queue is busy(hctx->dispatch_busy is > 0). We use EWMA to compute
> hctx->dispatch_busy, and it is adaptive, even though the implementation is quite
> coarse. But there should be much space to improve, IMO.
You are correct, however nvme-tcp should be getting to dispatch_busy > 0
IIUC.
> It is reported that this way improves SQ high-end SCSI SSD very much[1],
> and MMC performance gets improved too[2].
>
> [1] https://lore.kernel.org/linux-block/3cc3e03901dc1a63ef32e036182521af@mail.gmail.com/
> [2] https://lore.kernel.org/linux-block/CADBw62o9eTQDJ9RvNgEqSpXmg6Xcq=2TxH0Hfxhp29uF2W=TXA@mail.gmail.com/
Yes, the guys paid attention to the MMC related improvements that you
made.
>> The i10 I/O scheduler builds upon recent work on [6]. We have tested the i10 I/O
>> scheduler with nvme-tcp optimizaitons [2,3] and batching dispatch [4], varying number
>> of cores, varying read/write ratios, and varying request sizes, and with NVMe SSD and
>> RAM block device. For NVMe SSDs, the i10 I/O scheduler achieves ~60% improvements in
>> terms of IOPS per core over "noop" I/O scheduler. These results are available at [5],
>> and many additional results are presented in [6].
>
> In case of none scheduler, basically nvme driver won't provide any queue busy
> feedback, so the built-in batching dispatch doesn't work simply.
Exactly.
> kyber scheduler uses io latency feedback to throttle and build io batch,
> can you compare i10 with kyber on nvme/nvme-tcp?
I assume it should be simple to get, I'll let Rachit/Jaehyun comment.
>> While other schedulers may also batch I/O (e.g., mq-deadline), the optimization target
>> in the i10 I/O scheduler is throughput maximization. Hence there is no latency target
>> nor a need for a global tracking context, so a new scheduler is needed rather than
>> to build this functionality to an existing scheduler.
>>
>> We currently use fixed default values as batching thresholds (e.g., 16 for #requests,
>> 64KB for #bytes, and 50us for timeout). These default values are based on sensitivity
>> tests in [6]. For our future work, we plan to support adaptive batching according to
>
> Frankly speaking, hardcode 16 #rquests or 64KB may not work everywhere,
> and product environment could be much complicated than your sensitivity
> tests. If possible, please start with adaptive batching.
That was my feedback as well for sure. But given that this is a
scheduler one would opt-in to anyway, that won't be a must-have
initially. I'm not sure if the guys made progress with this yet, I'll
let them comment.
Powered by blists - more mailing lists