[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240711135030.GD1482543@nvidia.com>
Date: Thu, 11 Jul 2024 10:50:30 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: James Bottomley <James.Bottomley@...senpartnership.com>
Cc: Dan Williams <dan.j.williams@...el.com>, ksummit@...ts.linux.dev,
linux-cxl@...r.kernel.org, linux-rdma@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [MAINTAINERS SUMMIT] Device Passthrough Considered Harmful?
On Tue, Jul 09, 2024 at 09:02:25AM -0700, James Bottomley wrote:
> For NVMe and net we do have SPDK and DPDK. What I find is that people
> tend to use them for niche use cases (like the NVMe KV command set) or
> obscure network routers. Even though the claim they both make is to
> get the kernel out of the way and do stuff "way faster" the difficulty
> they create by bypassing everything is quite a high burden.
[..]
> What all of the prior pass through's taught us is that if the use case
> is big enough it will get pulled into the kernel and the kernel will
> usually manage it better (DB users). If it remains a niche use case it
> will likely remain out of the kernel, but we won't be hurt by it (NVME
> KV protocol) and sometimes it doesn't really matter and the device
> manufacturers will sort it out on their own (USB tokens).
I don't see it as being linked to big enough use case at all.
The kernel gets involved if there are good technical reasons to do
so. Databases running over real filesystems with O_DIRECT is really
technically better than raw block devices.
While DPDK shows the opposite, userspace is the technically better
option. This is now shown at scale. DPDK is not some niche. A big
chunk of internet traffic is going through DPDKs, especially for
mobile. Many ORAN solutions include DPDK on Linux.
What has been improved kernel-side is the intergation. DPDK
deployments now often use RDMA raw queue pairs instead of VFIO, which
laregly eliminates the "high burden".
There are many other cases, like DPDK, where the right answer is to
reduce the kernel involvement. It is not so simple that things always
get pulled into the kernel.
Jason
Powered by blists - more mailing lists