[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6d53cf9e-a731-402c-8fc1-6dfe476bc35c@grimberg.me>
Date: Tue, 11 Jun 2024 14:01:32 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Christoph Hellwig <hch@....de>
Cc: Jakub Kicinski <kuba@...nel.org>, Aurelien Aptel <aaptel@...dia.com>,
linux-nvme@...ts.infradead.org, netdev@...r.kernel.org, kbusch@...nel.org,
axboe@...com, chaitanyak@...dia.com, davem@...emloft.net
Subject: Re: [PATCH v25 00/20] nvme-tcp receive offloads
On 11/06/2024 9:41, Christoph Hellwig wrote:
> On Mon, Jun 10, 2024 at 05:30:34PM +0300, Sagi Grimberg wrote:
>>> efficient header splitting in the NIC, either hard coded or even
>>> better downloadable using something like eBPF.
>> From what I understand, this is what this offload is trying to do. It uses
>> the nvme command_id similar to how the read_stag is used in iwarp,
>> it tracks the NVMe/TCP pdus to split pdus from data transfers, and maps
>> the command_id to an internal MR for dma purposes.
>>
>> What I think you don't like about this is the interface that the offload
>> exposes
>> to the TCP ulp driver (nvme-tcp in our case)?
> I don't see why a memory registration is needed at all.
I don't see how you can do it without memory registration.
>
> The by far biggest painpoint when doing storage protocols (including
> file systems) over IP based storage is the data copy on the receive
> path because the payload is not aligned to a page boundary.
>
> So we need to figure out a way that is as stateless as possible that
> allows aligning the actual data payload on a page boundary in an
> otherwise normal IP receive path.
But the device gets payload from the network, and needs a buffer
to dma to. In order to dma to the "correct" buffer it needs some
sort of pre-registration expressed with a tag, that the device can
infer by some sort of stream inspection. The socket recv call from
the ulp happens at a later stage.
I am not sure I understand the alignment assurance help the NIC
to dma payload from the network to the "correct" buffer
(i.e. userspace doing O_DIRECT read).
Powered by blists - more mailing lists