[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHS8izM8iLC9J1xSHScMrMkVyoY5HZ_nFMRO4V7HYarHhZhk6Q@mail.gmail.com>
Date: Mon, 1 Apr 2024 12:22:24 -0700
From: Mina Almasry <almasrymina@...gle.com>
To: Christoph Hellwig <hch@...radead.org>, Marc Harvey <marcharvey@...gle.com>,
"Cong Wang ." <cong.wang@...edance.com>
Cc: shakeel.butt@...ux.dev, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
linux-alpha@...r.kernel.org, linux-mips@...r.kernel.org,
linux-parisc@...r.kernel.org, sparclinux@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
bpf@...r.kernel.org, linux-kselftest@...r.kernel.org,
linux-media@...r.kernel.org, dri-devel@...ts.freedesktop.org,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Jonathan Corbet <corbet@....net>,
Richard Henderson <richard.henderson@...aro.org>, Ivan Kokshaysky <ink@...assic.park.msu.ru>,
Matt Turner <mattst88@...il.com>, Thomas Bogendoerfer <tsbogend@...ha.franken.de>,
"James E.J. Bottomley" <James.Bottomley@...senpartnership.com>, Helge Deller <deller@....de>,
Andreas Larsson <andreas@...sler.com>, Jesper Dangaard Brouer <hawk@...nel.org>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Arnd Bergmann <arnd@...db.de>, Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, David Ahern <dsahern@...nel.org>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>, Shuah Khan <shuah@...nel.org>,
Sumit Semwal <sumit.semwal@...aro.org>, Christian König <christian.koenig@....com>,
Pavel Begunkov <asml.silence@...il.com>, David Wei <dw@...idwei.uk>, Jason Gunthorpe <jgg@...pe.ca>,
Yunsheng Lin <linyunsheng@...wei.com>, Shailend Chand <shailend@...gle.com>,
Harshitha Ramamurthy <hramamurthy@...gle.com>, Jeroen de Borst <jeroendb@...gle.com>,
Praveen Kaligineedi <pkaligineedi@...gle.com>
Subject: Re: [RFC PATCH net-next v6 02/15] net: page_pool: create hooks for
custom page providers
On Thu, Mar 28, 2024 at 12:31 AM Christoph Hellwig <hch@...radead.org> wrote:
>
> On Tue, Mar 26, 2024 at 01:19:20PM -0700, Mina Almasry wrote:
> >
> > Are you envisioning that dmabuf support would be added to the block
> > layer
>
> Yes.
>
> > (which I understand is part of the VFS and not driver specific),
>
> The block layer isn't really the VFS, it's just another core stack
> like the network stack.
>
> > or as part of the specific storage driver (like nvme for example)? If
> > we can add dmabuf support to the block layer itself that sounds
> > awesome. We may then be able to do devmem TCP on all/most storage
> > devices without having to modify each individual driver.
>
> I suspect we'll still need to touch the drivers to understand it,
> but hopefully all the main infrastructure can live in the block layer.
>
> > In your estimation, is adding dmabuf support to the block layer
> > something technically feasible & acceptable upstream? I notice you
> > suggested it so I'm guessing yes to both, but I thought I'd confirm.
>
> I think so, and I know there has been quite some interest to at least
> pre-register userspace memory so that the iommu overhead can be
> pre-loaded. It also is a much better interface for Peer to Peer
> transfers than what we currently have.
>
I think this is positively thrilling news for me. I was worried that
adding devmemTCP support to storage devices would involve using a
non-dmabuf standard of buffer sharing like pci_p2pdma_
(drivers/pci/p2pdma.c) and that would require messy changes to
pci_p2pdma_ that would get nacked. Also it would require adding
pci_p2pdma_ support to devmem TCP, which is a can of worms. If adding
dma-buf support to storage devices is feasible and desirable, that's a
much better approach IMO. (a) it will maybe work with devmem TCP
without any changes needed on the netdev side of things and (b)
dma-buf support may be generically useful and a good contribution even
outside of devmem TCP.
I don't have a concrete user for devmem TCP for storage devices but
the use case is very similar to GPU and I imagine the benefits in perf
can be significant in some setups.
Christoph, if you have any hints or rough specific design in mind for
how dma-buf support can be added to the block layer, please do let us
know and we'll follow your hints to investigate. But I don't want to
use up too much of your time. Marc and I can definitely read enough
code to figure out how to do it ourselves :-)
Marc, please review and consider this thread and work, this could be a
good project for you and I. I imagine the work would be:
1. Investigate how to add dma-buf support to the block layer (maybe
write a prototype code, and maybe even test it with devmem TCP).
2. Share a code or no-code proposal with netdev/fs/block layer mailing
list and try to work through concerns/nacks.
3. Finally share RFC through merging etc.
--
Thanks,
Mina
Powered by blists - more mailing lists