lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d439964b-1544-f37a-6ab4-ec1fb4cd0d9d@nvidia.com>
Date: Fri, 11 Aug 2023 05:28:24 +0000
From: Chaitanya Kulkarni <chaitanyak@...dia.com>
To: Sagi Grimberg <sagi@...mberg.me>, Aurelien Aptel <aaptel@...dia.com>, Shai
 Malin <smalin@...dia.com>
CC: "davem@...emloft.net" <davem@...emloft.net>, Boris Pismenny
	<borisp@...dia.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"kuba@...nel.org" <kuba@...nel.org>, "aurelien.aptel@...il.com"
	<aurelien.aptel@...il.com>, "hch@....de" <hch@....de>, "axboe@...com"
	<axboe@...com>, "malin1024@...il.com" <malin1024@...il.com>, Or Gerlitz
	<ogerlitz@...dia.com>, Yoray Zack <yorayz@...dia.com>,
	"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>, Gal Shalom
	<galshalom@...dia.com>, Max Gurtovoy <mgurtovoy@...dia.com>,
	"kbusch@...nel.org" <kbusch@...nel.org>
Subject: Re: [PATCH v12 07/26] nvme-tcp: Add DDP offload control path

On 8/9/2023 12:39 AM, Sagi Grimberg wrote:
> 
> 
> On 8/1/23 05:25, Chaitanya Kulkarni wrote:
>> On 7/12/23 09:14, Aurelien Aptel wrote:
>>> From: Boris Pismenny <borisp@...dia.com>
>>>
>>> This commit introduces direct data placement offload to NVME
>>> TCP. There is a context per queue, which is established after the
>>> handshake using the sk_add/del NDOs.
>>>
>>> Additionally, a resynchronization routine is used to assist
>>> hardware recovery from TCP OOO, and continue the offload.
>>> Resynchronization operates as follows:
>>>
>>> 1. TCP OOO causes the NIC HW to stop the offload
>>>
>>> 2. NIC HW identifies a PDU header at some TCP sequence number,
>>> and asks NVMe-TCP to confirm it.
>>> This request is delivered from the NIC driver to NVMe-TCP by first
>>> finding the socket for the packet that triggered the request, and
>>> then finding the nvme_tcp_queue that is used by this routine.
>>> Finally, the request is recorded in the nvme_tcp_queue.
>>>
>>> 3. When NVMe-TCP observes the requested TCP sequence, it will compare
>>> it with the PDU header TCP sequence, and report the result to the
>>> NIC driver (resync), which will update the HW, and resume offload
>>> when all is successful.
>>>
>>> Some HW implementation such as ConnectX-7 assume linear CCID (0...N-1
>>> for queue of size N) where the linux nvme driver uses part of the 16
>>> bit CCID for generation counter. To address that, we use the existing
>>> quirk in the nvme layer when the HW driver advertises if the device is
>>> not supports the full 16 bit CCID range.
>>>
>>> Furthermore, we let the offloading driver advertise what is the max hw
>>> sectors/segments via ulp_ddp_limits.
>>>
>>> A follow-up patch introduces the data-path changes required for this
>>> offload.
>>>
>>> Socket operations need a netdev reference. This reference is
>>> dropped on NETDEV_GOING_DOWN events to allow the device to go down in
>>> a follow-up patch.
>>>
>>> Signed-off-by: Boris Pismenny <borisp@...dia.com>
>>> Signed-off-by: Ben Ben-Ishay <benishay@...dia.com>
>>> Signed-off-by: Or Gerlitz <ogerlitz@...dia.com>
>>> Signed-off-by: Yoray Zack <yorayz@...dia.com>
>>> Signed-off-by: Shai Malin <smalin@...dia.com>
>>> Signed-off-by: Aurelien Aptel <aaptel@...dia.com>
>>> Reviewed-by: Chaitanya Kulkarni <kch@...dia.com>
>>> ---
>>
>> For NVMe related code :-
>>
>> Offload feature is configurable and maybe not be turned on in the absence
>> of the H/W. In order to keep the nvme/host/tcp.c file small to only 
>> handle
>> core related functionality, I wonder if we should to move tcp-offload 
>> code
>> into it's own file say nvme/host/tcp-offload.c ?
> 
> Maybe. it wouldn't be tcp_offload.c but rather tcp_ddp.c because its not
> offloading the tcp stack but rather doing direct data placement.
>

fine ...

> If we are going to do that it will pollute nvme.h or add a common
> header file, which is something I'd like to avoid if possible.

my comment was mainly based on how we separated the core code from
configurable features, and wondering any decision we make for host will
also apply for the target ddp code in future, but you prefer to keep it 
as it is let's not bloat header ...

-ck



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ