[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <06d9c3c9-8d27-46bf-a0cf-0c3ea1a0d3ec@grimberg.me>
Date: Mon, 3 Jun 2024 10:09:26 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Christoph Hellwig <hch@....de>, Jakub Kicinski <kuba@...nel.org>
Cc: Aurelien Aptel <aaptel@...dia.com>, linux-nvme@...ts.infradead.org,
netdev@...r.kernel.org, kbusch@...nel.org, axboe@...com,
chaitanyak@...dia.com, davem@...emloft.net
Subject: Re: [PATCH v25 00/20] nvme-tcp receive offloads
On 31/05/2024 9:11, Christoph Hellwig wrote:
> FYI, I still absolutely detest this code. I know people want to
> avoid the page copy for NVMe over TCP (or any TCP based storage
> protocols for that matter), but having these weird vendors specific
> hooks all the way up into the application protocol are just horrible.
I hoped for a transparent ddp offload as well, but I don't see how this
is possible.
>
> IETF has standardized a generic data placement protocol, which is
> part of iWarp. Even if folks don't like RDMA it exists to solve
> exactly these kinds of problems of data placement.
iWARP changes the wire protocol. Is your comment to just go make people
use iWARP instead of TCP? or extending NVMe/TCP to natively support DDP?
I think that the former is limiting, and the latter is unclear.
From what I understand, the offload engine uses the NVMe command-id as
the rkey (or stag) for ddp purposes.
> And if we can't
> arse folks into standard data placement methods we at least need it
> vendor independent and without hooks into the actual protocol
> driver.
>
That would be great, but what does a "vendor independent without hooks"
look like from
your perspective? I'd love having this translate to standard (and some
new) socket operations,
but I could not find a way that this can be done given the current
architecture.
Early on, I thought that enabling the queue offload could be modeled as
a setsockopt() and
and nvme_tcp_setup_ddp() would be modeled as a new
recvmsg(MSG_DDP_BUFFER, iovec, tag) but where I got stuck was the whole
async teardown mechanism that the nic has. But if this is solvable, I
think such an interface is much better.
FWIW, I think that the benefit of this is worth having. I think that the
folks from NVIDIA
are committed to supporting and evolving it.
Powered by blists - more mailing lists