lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLW86_BsB97jZw0joPwne_K79iiqwogCYFA7U9dZa3jhQ@mail.gmail.com>
Date: Wed, 14 May 2025 00:12:28 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Aurelien Aptel <aaptel@...dia.com>
Cc: linux-nvme@...ts.infradead.org, netdev@...r.kernel.org, sagi@...mberg.me, 
	hch@....de, kbusch@...nel.org, axboe@...com, chaitanyak@...dia.com, 
	davem@...emloft.net, kuba@...nel.org, Boris Pismenny <borisp@...dia.com>, 
	aurelien.aptel@...il.com, smalin@...dia.com, malin1024@...il.com, 
	ogerlitz@...dia.com, yorayz@...dia.com, galshalom@...dia.com, 
	mgurtovoy@...dia.com, tariqt@...dia.com, gus@...labora.com, pabeni@...hat.com, 
	dsahern@...nel.org, ast@...nel.org, jacob.e.keller@...el.com
Subject: Re: [PATCH v28 01/20] net: Introduce direct data placement tcp offload

On Wed, Apr 30, 2025 at 1:58 AM Aurelien Aptel <aaptel@...dia.com> wrote:
>
> From: Boris Pismenny <borisp@...dia.com>
>
> This commit introduces direct data placement (DDP) offload for TCP.
>
> The motivation is saving compute resources/cycles that are spent
> to copy data from SKBs to the block layer buffers and CRC
> calculation/verification for received PDUs (Protocol Data Units).
>
> The DDP capability is accompanied by new net_device operations that
> configure hardware contexts.
>
> There is a context per socket, and a context per DDP operation.
> Additionally, a resynchronization routine is used to assist
> hardware handle TCP OOO, and continue the offload. Furthermore,
> we let the offloading driver advertise what is the max hw
> sectors/segments.
>
> The interface includes the following net-device ddp operations:
>
>  1. sk_add - add offload for the queue represented by socket+config pair
>  2. sk_del - remove the offload for the socket/queue
>  3. ddp_setup - request copy offload for buffers associated with an IO
>  4. ddp_teardown - release offload resources for that IO
>  5. limits - query NIC driver for quirks and limitations (e.g.
>              max number of scatter gather entries per IO)
>  6. set_caps - request ULP DDP capabilities enablement
>  7. get_caps - request current ULP DDP capabilities
>  8. get_stats - query NIC driver for ULP DDP stats
>
> Using this interface, the NIC hardware will scatter TCP payload
> directly to the BIO pages according to the command_id.
>
> To maintain the correctness of the network stack, the driver is
> expected to construct SKBs that point to the BIO pages.
>
> The SKB passed to the network stack from the driver represents
> data as it is on the wire, while it is pointing directly to data
> in destination buffers.
>
> As a result, data from page frags should not be copied out to
> the linear part. To avoid needless copies, such as when using
> skb_condense, we mark the skb->no_condense bit.
> In addition, the skb->ulp_crc will be used by the upper layers to
> determine if CRC re-calculation is required. The two separated skb
> indications are needed to avoid false positives GRO flushing events.
>
> Follow-up patches will use this interface for DDP in NVMe-TCP.
>
> Capability bits stored in net_device allow drivers to report which
> ULP DDP capabilities a device supports. Control over these
> capabilities will be exposed to userspace in later patches.
>
>
>

>  /**
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index beb084ee4f4d..38b800f2593d 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -851,6 +851,8 @@ enum skb_tstamp_type {
>   *     @slow_gro: state present at GRO time, slower prepare step required
>   *     @tstamp_type: When set, skb->tstamp has the
>   *             delivery_time clock base of skb->tstamp.
> + *     @no_condense: When set, don't condense fragments (DDP offloaded)
> + *     @ulp_crc: CRC offloaded
>   *     @napi_id: id of the NAPI struct this skb came from
>   *     @sender_cpu: (aka @napi_id) source CPU in XPS
>   *     @alloc_cpu: CPU which did the skb allocation.
> @@ -1028,6 +1030,10 @@ struct sk_buff {
>         __u8                    csum_not_inet:1;
>  #endif
>         __u8                    unreadable:1;
> +#ifdef CONFIG_ULP_DDP
> +       __u8                    no_condense:1;
> +       __u8                    ulp_crc:1;
> +#endif

Sorry for the late review.

I do not think you need a precious bit for no_condense feature.

After all, the driver knows it deals with DDP, and can make sure to
place the headers right before skb_shinfo() so that skb_tailroom() is
zero.


If you really must prevent any pull from a page frag to skb->head
(preventing skb->head realloc), skb->unreadable should work just fine
?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ