[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <253zfcs2qaw.fsf@nvidia.com>
Date: Fri, 25 Jul 2025 19:22:47 +0300
From: Aurelien Aptel <aaptel@...dia.com>
To: Jens Axboe <axboe@...nel.dk>, linux-nvme@...ts.infradead.org,
netdev@...r.kernel.org, sagi@...mberg.me, hch@....de, kbusch@...nel.org,
axboe@...com, chaitanyak@...dia.com, davem@...emloft.net, kuba@...nel.org
Cc: aurelien.aptel@...il.com, smalin@...dia.com, malin1024@...il.com,
ogerlitz@...dia.com, yorayz@...dia.com, borisp@...dia.com,
galshalom@...dia.com, mgurtovoy@...dia.com, tariqt@...dia.com,
gus@...labora.com, viro@...iv.linux.org.uk, akpm@...ux-foundation.org
Subject: Re: [PATCH v30 03/20] iov_iter: skip copy if src == dst for direct
data placement
Jens Axboe <axboe@...nel.dk> writes:
> This seems like entirely the wrong place to apply this logic...
I understand it might look strange on first sight to have the check
implemented in this low level iterator copy function.
But it is actually where it makes the most sense.
Let's assume we move it to the caller, nvme-tcp. We now need read our
packets from the socket but somehow when we reach the offloaded payload,
skip ahead. There is no function for that in the socket API. We can walk
the SKB fragments but a single SKB can contain a combination of
offloaded and non-offloaded chunks. You now need to re-implement
skb_copy_datagram() to know what to copy to and where at the socket
layer. Even if you reuse the callback API you have to take care of the
destination bvec iterator. You end up duplicating that iterator
map/step/advance logic.
Our design is transparent to applications. The application does not need
to handle the skipping logic, and the skipping is done in a generic way
in the underlying copy function. It also will work for other protocol
with no extra code. All the work will happen in the driver, which needs
to construct SKBs using the application buffers. Making drivers
communicate information to the upper layer is already a common
practice. The SKBs are already assembled by drivers today according to
how the data is received from the network.
We have covered some of the design decisions in previous discussion,
more recently on v25. I suggest you can take a look [1].
Regarding performances, we were not able to see any measurable
differences between fio runs with the CONFIG disabled and enabled.
Thanks
1: https://lore.kernel.org/netdev/20240529160053.111531-1-aaptel@nvidia.com/
Powered by blists - more mailing lists