[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6f48fa5d-465c-5c38-ea45-704e86ba808b@gmail.com>
Date: Mon, 7 Dec 2020 17:42:14 -0700
From: David Ahern <dsahern@...il.com>
To: Boris Pismenny <borisp@...lanox.com>, kuba@...nel.org,
davem@...emloft.net, saeedm@...dia.com, hch@....de,
sagi@...mberg.me, axboe@...com, kbusch@...nel.org,
viro@...iv.linux.org.uk, edumazet@...gle.com
Cc: boris.pismenny@...il.com, linux-nvme@...ts.infradead.org,
netdev@...r.kernel.org, benishay@...dia.com, ogerlitz@...dia.com,
yorayz@...dia.com, Ben Ben-Ishay <benishay@...lanox.com>,
Or Gerlitz <ogerlitz@...lanox.com>,
Yoray Zack <yorayz@...lanox.com>
Subject: Re: [PATCH v1 net-next 02/15] net: Introduce direct data placement
tcp offload
On 12/7/20 2:06 PM, Boris Pismenny wrote:
> This commit introduces direct data placement offload for TCP.
> This capability is accompanied by new net_device operations that
> configure
> hardware contexts. There is a context per socket, and a context per DDP
> opreation. Additionally, a resynchronization routine is used to assist
> hardware handle TCP OOO, and continue the offload.
> Furthermore, we let the offloading driver advertise what is the max hw
> sectors/segments.
>
> Using this interface, the NIC hardware will scatter TCP payload directly
> to the BIO pages according to the command_id.
> To maintain the correctness of the network stack, the driver is expected
> to construct SKBs that point to the BIO pages.
>
> This, the SKB represents the data on the wire, while it is pointing
> to data that is already placed in the destination buffer.
> As a result, data from page frags should not be copied out to
> the linear part.
>
> As SKBs that use DDP are already very memory efficient, we modify
> skb_condence to avoid copying data from fragments to the linear
> part of SKBs that belong to a socket that uses DDP offload.
>
> A follow-up patch will use this interface for DDP in NVMe-TCP.
>
You call this Direct Data Placement - which sounds like a marketing name.
Fundamentally, this starts with offloading TCP socket buffers for a
specific flow, so generically a TCP Rx zerocopy for kernel stack managed
sockets (as opposed to AF_XDP's zerocopy). Why is this not building in
that level of infrastructure first and adding ULPs like NVME on top?
Powered by blists - more mailing lists