[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230419094506.2658b73f@kernel.org>
Date: Wed, 19 Apr 2023 09:45:06 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Christoph Hellwig <hch@...radead.org>
Cc: Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, netdev@...r.kernel.org,
Björn Töpel <bjorn@...nel.org>,
Magnus Karlsson <magnus.karlsson@...el.com>,
Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
Jonathan Lemon <jonathan.lemon@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>, bpf@...r.kernel.org,
virtualization@...ts.linux-foundation.org,
"Michael S. Tsirkin" <mst@...hat.com>,
Guenter Roeck <linux@...ck-us.net>,
Gerd Hoffmann <kraxel@...hat.com>,
Jason Wang <jasowang@...hat.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jens Axboe <axboe@...nel.dk>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH net-next] xsk: introduce xsk_dma_ops
On Tue, 18 Apr 2023 22:16:53 -0700 Christoph Hellwig wrote:
> On Mon, Apr 17, 2023 at 11:19:47PM -0700, Jakub Kicinski wrote:
> > Damn, that's unfortunate. Thinking aloud -- that means that if we want
> > to continue to pull memory management out of networking drivers to
> > improve it for all, cross-optimize with the rest of the stack and
> > allow various upcoming forms of zero copy -- then we need to add an
> > equivalent of dma_ops and DMA API locally in networking?
>
> Can you explain what the actual use case is?
>
> From the original patchset I suspect it is dma mapping something very
> long term and then maybe doing syncs on it as needed?
In this case yes, pinned user memory, it gets sliced up into MTU sized
chunks, fed into an Rx queue of a device, and user can see packets
without any copies.
Quite similar use case #2 is upcoming io_uring / "direct placement"
patches (former from Meta, latter for Google) which will try to receive
just the TCP data into pinned user memory.
And, as I think Olek mentioned, #3 is page_pool - which allocates 4k
pages, manages the DMA mappings, gives them to the device and tries
to recycle back to the device once TCP is done with them (avoiding the
unmapping and even atomic ops on the refcount, as in the good case page
refcount is always 1). See page_pool_return_skb_page() for the
recycling flow.
In all those cases it's more flexible (and faster) to hide the DMA
mapping from the driver. All the cases are also opt-in so we don't need
to worry about complete oddball devices. And to answer your question in
all cases we hope mapping/unmapping will be relatively rare while
syncing will be frequent.
AFAIU the patch we're discussing implements custom dma_ops for case #1,
but the same thing will be needed for #2, and #3. Question to me is
whether we need netdev-wide net_dma_ops or device model can provide us
with a DMA API that'd work for SoC/PCIe/virt devices.
Powered by blists - more mailing lists