netdev - Re: [PATCH v25 00/20] nvme-tcp receive offloads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <06d9c3c9-8d27-46bf-a0cf-0c3ea1a0d3ec@grimberg.me>
Date: Mon, 3 Jun 2024 10:09:26 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Christoph Hellwig <hch@....de>, Jakub Kicinski <kuba@...nel.org>
Cc: Aurelien Aptel <aaptel@...dia.com>, linux-nvme@...ts.infradead.org,
 netdev@...r.kernel.org, kbusch@...nel.org, axboe@...com,
 chaitanyak@...dia.com, davem@...emloft.net
Subject: Re: [PATCH v25 00/20] nvme-tcp receive offloads

On 31/05/2024 9:11, Christoph Hellwig wrote:
> FYI, I still absolutely detest this code.  I know people want to
> avoid the page copy for NVMe over TCP (or any TCP based storage
> protocols for that matter), but having these weird vendors specific
> hooks all the way up into the application protocol are just horrible.

I hoped for a transparent ddp offload as well, but I don't see how this
is possible.

>
> IETF has standardized a generic data placement protocol, which is
> part of iWarp.  Even if folks don't like RDMA it exists to solve
> exactly these kinds of problems of data placement.

iWARP changes the wire protocol. Is your comment to just go make people
use iWARP instead of TCP? or extending NVMe/TCP to natively support DDP?

I think that the former is limiting, and the latter is unclear.

 From what I understand, the offload engine uses the NVMe command-id as
the rkey (or stag) for ddp purposes.

>    And if we can't
> arse folks into standard data placement methods we at least need it
> vendor independent and without hooks into the actual protocol
> driver.
>

That would be great, but what does a "vendor independent without hooks" 
look like from
your perspective? I'd love having this translate to standard (and some 
new) socket operations,
but I could not find a way that this can be done given the current 
architecture.

Early on, I thought that enabling the queue offload could be modeled as 
a setsockopt() and
and nvme_tcp_setup_ddp() would be modeled as a new 
recvmsg(MSG_DDP_BUFFER, iovec, tag) but where I got stuck was the whole 
async teardown mechanism that the nic has. But if this is solvable, I 
think such an interface is much better.
FWIW, I think that the benefit of this is worth having. I think that the 
folks from NVIDIA
are committed to supporting and evolving it.