[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0C3E7CCF-DD56-4129-A6F6-4A181AA2D102@nutanix.com>
Date: Mon, 25 Mar 2019 15:44:56 +0000
From: Felipe Franciosi <felipe@...anix.com>
To: Keith Busch <kbusch@...nel.org>
CC: Maxim Levitsky <mlevitsk@...hat.com>,
Stefan Hajnoczi <stefanha@...hat.com>,
Fam Zheng <fam@...hon.net>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
Wolfram Sang <wsa@...-dreams.de>,
"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Keith Busch <keith.busch@...el.com>,
Kirti Wankhede <kwankhede@...dia.com>,
Mauro Carvalho Chehab <mchehab+samsung@...nel.org>,
"Paul E . McKenney" <paulmck@...ux.ibm.com>,
Christoph Hellwig <hch@....de>,
Sagi Grimberg <sagi@...mberg.me>,
"Harris, James R" <james.r.harris@...el.com>,
Liang Cunming <cunming.liang@...el.com>,
Jens Axboe <axboe@...com>,
Alex Williamson <alex.williamson@...hat.com>,
Thanos Makatos <thanos.makatos@...anix.com>,
John Ferlan <jferlan@...hat.com>,
Liu Changpeng <changpeng.liu@...el.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Nicolas Ferre <nicolas.ferre@...rochip.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Amnon Ilan <ailan@...hat.com>,
"David S . Miller" <davem@...emloft.net>
Subject: Re:
Hi Keith,
> On Mar 22, 2019, at 3:30 PM, Keith Busch <kbusch@...nel.org> wrote:
>
> On Fri, Mar 22, 2019 at 07:54:50AM +0000, Felipe Franciosi wrote:
>>>
>>> Note though that SPDK doesn't support sharing the device between host and the
>>> guests, it takes over the nvme device, thus it makes the kernel nvme driver
>>> unbind from it.
>>
>> That is absolutely true. However, I find it not to be a problem in practice.
>>
>> Hypervisor products, specially those caring about performance, efficiency and fairness, will dedicate NVMe devices for a particular purpose (eg. vDisk storage, cache, metadata) and will not share these devices for other use cases. That's because these products want to deterministically control the performance aspects of the device, which you just cannot do if you are sharing the device with a subsystem you do not control.
>
> I don't know, it sounds like you've traded kernel syscalls for IPC,
> and I don't think one performs better than the other.
Sorry, I'm not sure I understand. My point is that if you are packaging a distro to be a hypervisor and you want to use a storage device for VM data, you _most likely_ won't be using that device for anything else. To that end, driving the device directly from your application definitely gives you more deterministic control.
>
>> For scenarios where the device must be shared and such fine grained control is not required, it looks like using the kernel driver with io_uring offers very good performance with flexibility.
>
> NVMe's IO Determinism features provide fine grained control for shared
> devices. It's still uncommon to find hardware supporting that, though.
Sure, but then your hypervisor needs to certify devices that support that. This will limit your HCL. Moreover, unless the feature is solid, well-established and works reliably on all devices you support, it's arguably preferable to have an architecture which gives you that control in software.
Cheers,
Felipe
Powered by blists - more mailing lists