[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <81cdcb58-9a23-8192-6213-7f2408a3b8ee@grimberg.me>
Date: Fri, 13 Nov 2020 13:56:08 -0800
From: Sagi Grimberg <sagi@...mberg.me>
To: Jens Axboe <axboe@...nel.dk>, Rachit Agarwal <rach4x0r@...il.com>,
Christoph Hellwig <hch@....de>
Cc: linux-block@...r.kernel.org, linux-nvme@...ts.infradead.org,
linux-kernel@...r.kernel.org, Keith Busch <kbusch@...nel.org>,
Ming Lei <ming.lei@...hat.com>,
Jaehyun Hwang <jaehyun.hwang@...nell.edu>,
Qizhe Cai <qc228@...nell.edu>,
Midhul Vuppalapati <mvv25@...nell.edu>,
Rachit Agarwal <ragarwal@...cornell.edu>,
Sagi Grimberg <sagi@...htbitslabs.com>,
Rachit Agarwal <ragarwal@...nell.edu>
Subject: Re: [PATCH] iosched: Add i10 I/O Scheduler
>>>> But if you think this has a better home, I'm assuming that the guys
>>>> will be open to that.
>>>
>>> Also see the reply from Ming. It's a balancing act - don't want to add
>>> extra overhead to the core, but also don't want to carry an extra
>>> scheduler if the main change is really just variable dispatch batching.
>>> And since we already have a notion of that, seems worthwhile to explore
>>> that venue.
>>
>> I agree,
>>
>> The main difference is that this balancing is not driven from device
>> resource pressure, but rather from an assumption of device specific
>> optimization (and also with a specific optimization target), hence a
>> scheduler a user would need to opt-in seemed like a good compromise.
>>
>> But maybe Ming has some good ideas on a different way to add it..
>
> So here's another case - virtualized nvme. The commit overhead is
> suitably large there that performance suffers quite a bit, similarly to
> your remote storage case. If we had suitable logic in the core, then we
> could easily propagate this knowledge when setting up the queue. Then it
> could happen automatically, without needing a configuration to switch to
> a specific scheduler.
Yes, these use-cases share characteristics. I'm not at all opposed to
placing this in the core. I do think that in order to put something like
this in the core, the bar needs to be higher such that an optimization
target cannot be biased towards a workload (i.e. needs to be adaptive).
I'm still not sure how we would build this on top of what we already
have as it is really centered around device being busy (which is not
the case for nvme), but I didn't put enough thought into it yet.
Powered by blists - more mailing lists