lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e5b994ce-247c-4bfa-96a5-eb842ef99e20@suse.de>
Date: Wed, 2 Jul 2025 13:53:21 +0200
From: Hannes Reinecke <hare@...e.de>
To: Aurelien Aptel <aaptel@...dia.com>, linux-nvme@...ts.infradead.org,
 netdev@...r.kernel.org, sagi@...mberg.me, hch@....de, kbusch@...nel.org,
 axboe@...com, chaitanyak@...dia.com, davem@...emloft.net, kuba@...nel.org
Cc: Boris Pismenny <borisp@...dia.com>, aurelien.aptel@...il.com,
 smalin@...dia.com, malin1024@...il.com, ogerlitz@...dia.com,
 yorayz@...dia.com, galshalom@...dia.com, mgurtovoy@...dia.com,
 tariqt@...dia.com, gus@...labora.com, edumazet@...gle.com,
 pabeni@...hat.com, dsahern@...nel.org, ast@...nel.org,
 jacob.e.keller@...el.com
Subject: Re: [PATCH v29 01/20] net: Introduce direct data placement tcp
 offload

On 6/30/25 16:07, Aurelien Aptel wrote:
> From: Boris Pismenny <borisp@...dia.com>
> 
> This commit introduces direct data placement (DDP) offload for TCP.
> 
> The motivation is saving compute resources/cycles that are spent
> to copy data from SKBs to the block layer buffers and CRC
> calculation/verification for received PDUs (Protocol Data Units).
> 
> The DDP capability is accompanied by new net_device operations that
> configure hardware contexts.
> 
> There is a context per socket, and a context per DDP operation.
> Additionally, a resynchronization routine is used to assist
> hardware handle TCP OOO, and continue the offload. Furthermore,
> we let the offloading driver advertise what is the max hw
> sectors/segments.
> 
> The interface includes the following net-device ddp operations:
> 
>   1. sk_add - add offload for the queue represented by socket+config pair
>   2. sk_del - remove the offload for the socket/queue
>   3. ddp_setup - request copy offload for buffers associated with an IO
>   4. ddp_teardown - release offload resources for that IO
>   5. limits - query NIC driver for quirks and limitations (e.g.
>               max number of scatter gather entries per IO)
>   6. set_caps - request ULP DDP capabilities enablement
>   7. get_caps - request current ULP DDP capabilities
>   8. get_stats - query NIC driver for ULP DDP stats
> 
> Using this interface, the NIC hardware will scatter TCP payload
> directly to the BIO pages according to the command_id.
> 
> To maintain the correctness of the network stack, the driver is
> expected to construct SKBs that point to the BIO pages.
> 
> The SKB passed to the network stack from the driver represents
> data as it is on the wire, while it is pointing directly to data
> in destination buffers.
> 
> As a result, data from page frags should not be copied out to
> the linear part. To avoid needless copies, such as when using
> skb_condense, we mark the sk->sk_no_condense bit.
> In addition, the skb->ulp_crc will be used by the upper layers to
> determine if CRC re-calculation is required. The two separated skb
> indications are needed to avoid false positives GRO flushing events.
> 
> Follow-up patches will use this interface for DDP in NVMe-TCP.
> 
> Capability bits stored in net_device allow drivers to report which
> ULP DDP capabilities a device supports. Control over these
> capabilities will be exposed to userspace in later patches.
> 
> Signed-off-by: Boris Pismenny <borisp@...dia.com>
> Signed-off-by: Ben Ben-Ishay <benishay@...dia.com>
> Signed-off-by: Or Gerlitz <ogerlitz@...dia.com>
> Signed-off-by: Yoray Zack <yorayz@...dia.com>
> Signed-off-by: Shai Malin <smalin@...dia.com>
> Signed-off-by: Aurelien Aptel <aaptel@...dia.com>
> ---
>   include/linux/netdevice.h          |   5 +
>   include/linux/skbuff.h             |  31 +++
>   include/net/inet_connection_sock.h |   6 +
>   include/net/sock.h                 |   3 +-
>   include/net/tcp.h                  |   3 +-
>   include/net/ulp_ddp.h              | 326 +++++++++++++++++++++++++++++
>   net/Kconfig                        |  20 ++
>   net/core/Makefile                  |   1 +
>   net/core/skbuff.c                  |   4 +-
>   net/core/ulp_ddp.c                 |  56 +++++
>   net/ipv4/tcp_input.c               |   1 +
>   net/ipv4/tcp_offload.c             |   1 +
>   12 files changed, 454 insertions(+), 3 deletions(-)
>   create mode 100644 include/net/ulp_ddp.h
>   create mode 100644 net/core/ulp_ddp.c
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index db5bfd4e7ec8..fe510ba65c7b 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1401,6 +1401,8 @@ struct netdev_net_notifier {
>    *			   struct kernel_hwtstamp_config *kernel_config,
>    *			   struct netlink_ext_ack *extack);
>    *	Change the hardware timestamping parameters for NIC device.
> + * struct ulp_ddp_dev_ops *ulp_ddp_ops;
> + *	ULP DDP operations (see include/net/ulp_ddp.h)
>    */
>   struct net_device_ops {
>   	int			(*ndo_init)(struct net_device *dev);
> @@ -1656,6 +1658,9 @@ struct net_device_ops {
>   	 */
>   	const struct net_shaper_ops *net_shaper_ops;
>   #endif
> +#if IS_ENABLED(CONFIG_ULP_DDP)
> +	const struct ulp_ddp_dev_ops *ulp_ddp_ops;
> +#endif
>   };
>   
>   /**
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 4f6dcb37bae8..a9e8fc6582e2 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -847,6 +847,7 @@ enum skb_tstamp_type {
>    *	@slow_gro: state present at GRO time, slower prepare step required
>    *	@tstamp_type: When set, skb->tstamp has the
>    *		delivery_time clock base of skb->tstamp.
> + *	@ulp_crc: CRC offloaded
>    *	@napi_id: id of the NAPI struct this skb came from
>    *	@sender_cpu: (aka @napi_id) source CPU in XPS
>    *	@alloc_cpu: CPU which did the skb allocation.
> @@ -1024,6 +1025,9 @@ struct sk_buff {
>   	__u8			csum_not_inet:1;
>   #endif
>   	__u8			unreadable:1;
> +#ifdef CONFIG_ULP_DDP
> +	__u8			ulp_crc:1;
> +#endif
>   #if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS)
>   	__u16			tc_index;	/* traffic control index */
>   #endif
> @@ -5267,5 +5271,32 @@ static inline void skb_mark_for_recycle(struct sk_buff *skb)
>   ssize_t skb_splice_from_iter(struct sk_buff *skb, struct iov_iter *iter,
>   			     ssize_t maxsize, gfp_t gfp);
>   
> +static inline bool skb_is_ulp_crc(const struct sk_buff *skb)
> +{
> +#ifdef CONFIG_ULP_DDP
> +	return skb->ulp_crc;
> +#else
> +	return 0;
> +#endif
> +}
> +
> +static inline bool skb_cmp_ulp_crc(const struct sk_buff *skb1,
> +				   const struct sk_buff *skb2)
> +{
> +#ifdef CONFIG_ULP_DDP
> +	return skb1->ulp_crc != skb2->ulp_crc;
> +#else
> +	return 0;
> +#endif
> +}
> +
> +static inline void skb_copy_ulp_crc(struct sk_buff *to,
> +				    const struct sk_buff *from)
> +{
> +#ifdef CONFIG_ULP_DDP
> +	to->ulp_crc = from->ulp_crc;
> +#endif
> +}
> +
>   #endif	/* __KERNEL__ */
>   #endif	/* _LINUX_SKBUFF_H */
> diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
> index 1735db332aab..65cca0d4d6c2 100644
> --- a/include/net/inet_connection_sock.h
> +++ b/include/net/inet_connection_sock.h
> @@ -63,6 +63,8 @@ struct inet_connection_sock_af_ops {
>    * @icsk_af_ops		   Operations which are AF_INET{4,6} specific
>    * @icsk_ulp_ops	   Pluggable ULP control hook
>    * @icsk_ulp_data	   ULP private data
> + * @icsk_ulp_ddp_ops	   Pluggable ULP direct data placement control hook
> + * @icsk_ulp_ddp_data	   ULP direct data placement private data
>    * @icsk_ca_state:	   Congestion control state
>    * @icsk_retransmits:	   Number of unrecovered [RTO] timeouts
>    * @icsk_pending:	   Scheduled timer event
> @@ -92,6 +94,10 @@ struct inet_connection_sock {
>   	const struct inet_connection_sock_af_ops *icsk_af_ops;
>   	const struct tcp_ulp_ops  *icsk_ulp_ops;
>   	void __rcu		  *icsk_ulp_data;
> +#ifdef CONFIG_ULP_DDP
> +	const struct ulp_ddp_ulp_ops  *icsk_ulp_ddp_ops;
> +	void __rcu		  *icsk_ulp_ddp_data;
> +#endif
>   	unsigned int		  (*icsk_sync_mss)(struct sock *sk, u32 pmtu);
>   	__u8			  icsk_ca_state:5,
>   				  icsk_ca_initialized:1,
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 0f2443d4ec58..c1b3d6e1e5e5 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -507,7 +507,8 @@ struct sock {
>   	u8			sk_gso_disabled : 1,
>   				sk_kern_sock : 1,
>   				sk_no_check_tx : 1,
> -				sk_no_check_rx : 1;
> +				sk_no_check_rx : 1,
> +				sk_no_condense : 1;
>   	u8			sk_shutdown;
>   	u16			sk_type;
>   	u16			sk_protocol;
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 761c4a0ad386..389906e53df4 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1148,7 +1148,8 @@ static inline bool tcp_skb_can_collapse_rx(const struct sk_buff *to,
>   					   const struct sk_buff *from)
>   {
>   	return likely(mptcp_skb_can_collapse(to, from) &&
> -		      !skb_cmp_decrypted(to, from));
> +		      !skb_cmp_decrypted(to, from) &&
> +		      !skb_cmp_ulp_crc(to, from));
>   }
>   
>   /* Events passed to congestion control interface */
> diff --git a/include/net/ulp_ddp.h b/include/net/ulp_ddp.h
> new file mode 100644
> index 000000000000..7b32bb9e2a08
> --- /dev/null
> +++ b/include/net/ulp_ddp.h
> @@ -0,0 +1,326 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * ulp_ddp.h
> + *   Author:	Boris Pismenny <borisp@...dia.com>
> + *   Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
> + */
> +#ifndef _ULP_DDP_H
> +#define _ULP_DDP_H
> +
> +#include <linux/netdevice.h>
> +#include <net/inet_connection_sock.h>
> +#include <net/sock.h>
> +
> +enum ulp_ddp_type {
> +	ULP_DDP_NVME = 1,
> +};
> +
> +/**
> + * struct nvme_tcp_ddp_limits - nvme tcp driver limitations
> + *
> + * @full_ccid_range:	true if the driver supports the full CID range
> + */
> +struct nvme_tcp_ddp_limits {
> +	bool			full_ccid_range;
> +};
> +
> +/**
> + * struct ulp_ddp_limits - Generic ulp ddp limits: tcp ddp
> + * protocol limits.
> + * Add new instances of ulp_ddp_limits in the union below (nvme-tcp, etc.).
> + *
> + * @type:		type of this limits struct
> + * @max_ddp_sgl_len:	maximum sgl size supported (zero means no limit)
> + * @io_threshold:	minimum payload size required to offload
> + * @tls:		support for ULP over TLS
> + * @nvmeotcp:		NVMe-TCP specific limits
> + */
> +struct ulp_ddp_limits {
> +	enum ulp_ddp_type	type;
> +	int			max_ddp_sgl_len;
> +	int			io_threshold;
> +	bool			tls:1;
> +	union {
> +		struct nvme_tcp_ddp_limits nvmeotcp;
> +	};
> +};
> +
> +/**
> + * struct nvme_tcp_ddp_config - nvme tcp ddp configuration for an IO queue
> + *
> + * @pfv:	pdu version (e.g., NVME_TCP_PFV_1_0)
> + * @cpda:	controller pdu data alignment (dwords, 0's based)
> + * @dgst:	digest types enabled (header or data, see
> + *		enum nvme_tcp_digest_option).
> + *		The netdev will offload crc if it is supported.
> + * @queue_size: number of nvme-tcp IO queue elements
> + */
> +struct nvme_tcp_ddp_config {
> +	u16			pfv;
> +	u8			cpda;
> +	u8			dgst;
> +	int			queue_size;
> +};
> +
> +/**
> + * struct ulp_ddp_config - Generic ulp ddp configuration
> + * Add new instances of ulp_ddp_config in the union below (nvme-tcp, etc.).
> + *
> + * @type:	type of this config struct
> + * @nvmeotcp:	NVMe-TCP specific config
> + * @affinity_hint:	cpu core running the IO thread for this socket
> + */
> +struct ulp_ddp_config {
> +	enum ulp_ddp_type    type;
> +	int		     affinity_hint;
> +	union {
> +		struct nvme_tcp_ddp_config nvmeotcp;
> +	};
> +};
> +
> +/**
> + * struct ulp_ddp_io - ulp ddp configuration for an IO request.
> + *
> + * @command_id: identifier on the wire associated with these buffers
> + * @nents:	number of entries in the sg_table
> + * @sg_table:	describing the buffers for this IO request
> + * @first_sgl:	first SGL in sg_table
> + */
> +struct ulp_ddp_io {
> +	u32			command_id;
> +	int			nents;
> +	struct sg_table		sg_table;
> +	struct scatterlist	first_sgl[SG_CHUNK_SIZE];
> +};
> +
> +/**
> + * struct ulp_ddp_stats - ULP DDP offload statistics
> + * @rx_nvmeotcp_sk_add: number of sockets successfully prepared for offloading.
> + * @rx_nvmeotcp_sk_add_fail: number of sockets that failed to be prepared
> + *                           for offloading.
> + * @rx_nvmeotcp_sk_del: number of sockets where offloading has been removed.
> + * @rx_nvmeotcp_ddp_setup: number of NVMeTCP PDU successfully prepared for
> + *                         Direct Data Placement.
> + * @rx_nvmeotcp_ddp_setup_fail: number of PDUs that failed DDP preparation.
> + * @rx_nvmeotcp_ddp_teardown: number of PDUs done with DDP.
> + * @rx_nvmeotcp_drop: number of PDUs dropped.
> + * @rx_nvmeotcp_resync: number of resync.
> + * @rx_nvmeotcp_packets: number of offloaded PDUs.
> + * @rx_nvmeotcp_bytes: number of offloaded bytes.
> + */
> +struct ulp_ddp_stats {
> +	u64 rx_nvmeotcp_sk_add;
> +	u64 rx_nvmeotcp_sk_add_fail;
> +	u64 rx_nvmeotcp_sk_del;
> +	u64 rx_nvmeotcp_ddp_setup;
> +	u64 rx_nvmeotcp_ddp_setup_fail;
> +	u64 rx_nvmeotcp_ddp_teardown;
> +	u64 rx_nvmeotcp_drop;
> +	u64 rx_nvmeotcp_resync;
> +	u64 rx_nvmeotcp_packets;
> +	u64 rx_nvmeotcp_bytes;
> +
> +	/*
> +	 * add new stats at the end and keep in sync with
> +	 * Documentation/netlink/specs/ulp_ddp.yaml
> +	 */
> +};
> +
> +#define ULP_DDP_CAP_COUNT 1
> +
> +struct ulp_ddp_dev_caps {
> +	DECLARE_BITMAP(active, ULP_DDP_CAP_COUNT);
> +	DECLARE_BITMAP(hw, ULP_DDP_CAP_COUNT);
> +};
> +
> +struct netlink_ext_ack;
> +
> +/**
> + * struct ulp_ddp_dev_ops - operations used by an upper layer protocol
> + *                          to configure ddp offload
> + *
> + * @limits:    query ulp driver limitations and quirks.
> + * @sk_add:    add offload for the queue represented by socket+config
> + *             pair. this function is used to configure either copy, crc
> + *             or both offloads.
> + * @sk_del:    remove offload from the socket, and release any device
> + *             related resources.
> + * @setup:     request copy offload for buffers associated with a
> + *             command_id in ulp_ddp_io.
> + * @teardown:  release offload resources association between buffers
> + *             and command_id in ulp_ddp_io.
> + * @resync:    respond to the driver's resync_request. Called only if
> + *             resync is successful.
> + * @set_caps:  set device ULP DDP capabilities.
> + *	       returns a negative error code or zero.
> + * @get_caps:  get device ULP DDP capabilities.
> + * @get_stats: query ULP DDP statistics.
> + */
> +struct ulp_ddp_dev_ops {
> +	int (*limits)(struct net_device *netdev,
> +		      struct ulp_ddp_limits *limits);
> +	int (*sk_add)(struct net_device *netdev,
> +		      struct sock *sk,
> +		      struct ulp_ddp_config *config);
> +	void (*sk_del)(struct net_device *netdev,
> +		       struct sock *sk);
> +	int (*setup)(struct net_device *netdev,
> +		     struct sock *sk,
> +		     struct ulp_ddp_io *io);
> +	void (*teardown)(struct net_device *netdev,
> +			 struct sock *sk,
> +			 struct ulp_ddp_io *io,
> +			 void *ddp_ctx);
> +	void (*resync)(struct net_device *netdev,
> +		       struct sock *sk, u32 seq);
> +	int (*set_caps)(struct net_device *dev, unsigned long *bits,
> +			struct netlink_ext_ack *extack);
> +	void (*get_caps)(struct net_device *dev,
> +			 struct ulp_ddp_dev_caps *caps);
> +	int (*get_stats)(struct net_device *dev,
> +			 struct ulp_ddp_stats *stats);
> +};
> +
> +#define ULP_DDP_RESYNC_PENDING BIT(0)
> +
> +/**
> + * struct ulp_ddp_ulp_ops - Interface to register upper layer
> + *                          Direct Data Placement (DDP) TCP offload.
> + * @resync_request:         NIC requests ulp to indicate if @seq is the start
> + *                          of a message.
> + * @ddp_teardown_done:      NIC driver informs the ulp that teardown is done,
> + *                          used for async completions.
> + */
> +struct ulp_ddp_ulp_ops {
> +	bool (*resync_request)(struct sock *sk, u32 seq, u32 flags);
> +	void (*ddp_teardown_done)(void *ddp_ctx);
> +};
> +
> +/**
> + * struct ulp_ddp_ctx - Generic ulp ddp context
> + *
> + * @type:	type of this context struct
> + * @buf:	protocol-specific context struct
> + */
> +struct ulp_ddp_ctx {
> +	enum ulp_ddp_type	type;
> +	unsigned char		buf[];
> +};
> +
> +static inline struct ulp_ddp_ctx *ulp_ddp_get_ctx(struct sock *sk)
> +{
> +#ifdef CONFIG_ULP_DDP
> +	struct inet_connection_sock *icsk = inet_csk(sk);
> +
> +	return (__force struct ulp_ddp_ctx *)icsk->icsk_ulp_ddp_data;
> +#else
> +	return NULL;
> +#endif
> +}
> +
> +static inline void ulp_ddp_set_ctx(struct sock *sk, void *ctx)
> +{
> +#ifdef CONFIG_ULP_DDP
> +	struct inet_connection_sock *icsk = inet_csk(sk);
> +
> +	rcu_assign_pointer(icsk->icsk_ulp_ddp_data, ctx);
> +#endif
> +}
> +
> +static inline int ulp_ddp_setup(struct net_device *netdev,
> +				struct sock *sk,
> +				struct ulp_ddp_io *io)
> +{
> +#ifdef CONFIG_ULP_DDP
> +	return netdev->netdev_ops->ulp_ddp_ops->setup(netdev, sk, io);
> +#else
> +	return -EOPNOTSUPP;
> +#endif
> +}
> +
> +static inline void ulp_ddp_teardown(struct net_device *netdev,
> +				    struct sock *sk,
> +				    struct ulp_ddp_io *io,
> +				    void *ddp_ctx)
> +{
> +#ifdef CONFIG_ULP_DDP
> +	netdev->netdev_ops->ulp_ddp_ops->teardown(netdev, sk, io, ddp_ctx);
> +#endif
> +}
> +
> +static inline void ulp_ddp_resync(struct net_device *netdev,
> +				  struct sock *sk,
> +				  u32 seq)
> +{
> +#ifdef CONFIG_ULP_DDP
> +	netdev->netdev_ops->ulp_ddp_ops->resync(netdev, sk, seq);
> +#endif
> +}
> +
> +static inline int ulp_ddp_get_limits(struct net_device *netdev,
> +				     struct ulp_ddp_limits *limits,
> +				     enum ulp_ddp_type type)
> +{
> +#ifdef CONFIG_ULP_DDP
> +	limits->type = type;
> +	return netdev->netdev_ops->ulp_ddp_ops->limits(netdev, limits);
> +#else
> +	return -EOPNOTSUPP;
> +#endif
> +}
> +
> +static inline bool ulp_ddp_cap_turned_on(unsigned long *old,
> +					 unsigned long *new,
> +					 int bit_nr)
> +{
> +	return !test_bit(bit_nr, old) && test_bit(bit_nr, new);
> +}
> +
> +static inline bool ulp_ddp_cap_turned_off(unsigned long *old,
> +					  unsigned long *new,
> +					  int bit_nr)
> +{
> +	return test_bit(bit_nr, old) && !test_bit(bit_nr, new);
> +}
> +
> +#ifdef CONFIG_ULP_DDP
> +
> +int ulp_ddp_sk_add(struct net_device *netdev,
> +		   netdevice_tracker *tracker,
> +		   gfp_t gfp,
> +		   struct sock *sk,
> +		   struct ulp_ddp_config *config,
> +		   const struct ulp_ddp_ulp_ops *ops);
> +
> +void ulp_ddp_sk_del(struct net_device *netdev,
> +		    netdevice_tracker *tracker,
> +		    struct sock *sk);
> +
> +bool ulp_ddp_is_cap_active(struct net_device *netdev, int cap_bit_nr);
> +
> +#else
> +
> +static inline int ulp_ddp_sk_add(struct net_device *netdev,
> +				 netdevice_tracker *tracker,
> +				 gfp_t gfp,
> +				 struct sock *sk,
> +				 struct ulp_ddp_config *config,
> +				 const struct ulp_ddp_ulp_ops *ops)
> +{
> +	return -EOPNOTSUPP;
> +}
> +
> +static inline void ulp_ddp_sk_del(struct net_device *netdev,
> +				  netdevice_tracker *tracker,
> +				  struct sock *sk)
> +{}
> +
> +static inline bool ulp_ddp_is_cap_active(struct net_device *netdev,
> +					 int cap_bit_nr)
> +{
> +	return false;
> +}
> +
> +#endif
> +
> +#endif	/* _ULP_DDP_H */
> diff --git a/net/Kconfig b/net/Kconfig
> index ebc80a98fc91..803c4bfda43a 100644
> --- a/net/Kconfig
> +++ b/net/Kconfig
> @@ -541,4 +541,24 @@ config NET_TEST
>   
>   	  If unsure, say N.
>   
> +config ULP_DDP
> +	bool "ULP direct data placement offload"
> +	help
> +	  This feature provides a generic infrastructure for Direct
> +	  Data Placement (DDP) offload for Upper Layer Protocols (ULP,
> +	  such as NVMe-TCP).
> +
> +	  If the ULP and NIC driver supports it, the ULP code can
> +	  request the NIC to place ULP response data directly
> +	  into application memory, avoiding a costly copy.
> +
> +	  This infrastructure also allows for offloading the ULP data
> +	  integrity checks (e.g. data digest) that would otherwise
> +	  require another costly pass on the data we managed to avoid
> +	  copying.
> +
> +	  For more information, see
> +	  <file:Documentation/networking/ulp-ddp-offload.rst>.
> +
> +
>   endif   # if NET
> diff --git a/net/core/Makefile b/net/core/Makefile
> index b2a76ce33932..6d817870d7c3 100644
> --- a/net/core/Makefile
> +++ b/net/core/Makefile
> @@ -20,6 +20,7 @@ obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
>   obj-y += net-sysfs.o
>   obj-y += hotdata.o
>   obj-y += netdev_rx_queue.o
> +obj-$(CONFIG_ULP_DDP) += ulp_ddp.o
>   obj-$(CONFIG_PAGE_POOL) += page_pool.o page_pool_user.o
>   obj-$(CONFIG_PROC_FS) += net-procfs.o
>   obj-$(CONFIG_NET_PKTGEN) += pktgen.o
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index d6420b74ea9c..fe5a9df175cc 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -80,6 +80,7 @@
>   #include <net/mctp.h>
>   #include <net/page_pool/helpers.h>
>   #include <net/dropreason.h>
> +#include <net/ulp_ddp.h>
>   
>   #include <linux/uaccess.h>
>   #include <trace/events/skb.h>
> @@ -6940,7 +6941,8 @@ void skb_condense(struct sk_buff *skb)
>   {
>   	if (skb->data_len) {
>   		if (skb->data_len > skb->end - skb->tail ||
> -		    skb_cloned(skb) || !skb_frags_readable(skb))
> +		    skb_cloned(skb) || !skb_frags_readable(skb) ||
> +		    (skb->sk && skb->sk->sk_no_condense))
>   			return;
>   
>   		/* Nice, we can free page frag(s) right now */
> diff --git a/net/core/ulp_ddp.c b/net/core/ulp_ddp.c
> new file mode 100644
> index 000000000000..c02786ed5aeb
> --- /dev/null
> +++ b/net/core/ulp_ddp.c
> @@ -0,0 +1,56 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + *
> + * ulp_ddp.c
> + *   Author:	Aurelien Aptel <aaptel@...dia.com>
> + *   Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.
> + */
> +
> +#include <net/ulp_ddp.h>
> +
> +int ulp_ddp_sk_add(struct net_device *netdev,
> +		   netdevice_tracker *tracker,
> +		   gfp_t gfp,
> +		   struct sock *sk,
> +		   struct ulp_ddp_config *config,
> +		   const struct ulp_ddp_ulp_ops *ops)
> +{
> +	int ret;
> +
> +	/* put in ulp_ddp_sk_del() */
> +	netdev_hold(netdev, tracker, gfp);
> +
> +	ret = netdev->netdev_ops->ulp_ddp_ops->sk_add(netdev, sk, config);
> +	if (ret) {
> +		dev_put(netdev);
> +		return ret;
> +	}
> +
> +	inet_csk(sk)->icsk_ulp_ddp_ops = ops;
> +	sk->sk_no_condense = true;
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(ulp_ddp_sk_add);
> +
> +void ulp_ddp_sk_del(struct net_device *netdev,
> +		    netdevice_tracker *tracker,
> +		    struct sock *sk)
> +{
> +	netdev->netdev_ops->ulp_ddp_ops->sk_del(netdev, sk);
> +	inet_csk(sk)->icsk_ulp_ddp_ops = NULL;
> +	sk->sk_no_condense = false;
> +	netdev_put(netdev, tracker);
> +}
> +EXPORT_SYMBOL_GPL(ulp_ddp_sk_del);
> +
> +bool ulp_ddp_is_cap_active(struct net_device *netdev, int cap_bit_nr)
> +{
> +	struct ulp_ddp_dev_caps caps;
> +
> +	if (!netdev->netdev_ops->ulp_ddp_ops)
> +		return false;
> +	netdev->netdev_ops->ulp_ddp_ops->get_caps(netdev, &caps);
> +	return test_bit(cap_bit_nr, caps.active);
> +}
> +EXPORT_SYMBOL_GPL(ulp_ddp_is_cap_active);
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 19a1542883df..2351cc06d458 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -5368,6 +5368,7 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root,
>   
>   		memcpy(nskb->cb, skb->cb, sizeof(skb->cb));
>   		skb_copy_decrypted(nskb, skb);
> +		skb_copy_ulp_crc(nskb, skb);
>   		TCP_SKB_CB(nskb)->seq = TCP_SKB_CB(nskb)->end_seq = start;
>   		if (list)
>   			__skb_queue_before(list, skb, nskb);
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index d293087b426d..77ae71ea25f6 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -353,6 +353,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
>   
>   	flush |= (ntohl(th2->seq) + skb_gro_len(p)) ^ ntohl(th->seq);
>   	flush |= skb_cmp_decrypted(p, skb);
> +	flush |= skb_cmp_ulp_crc(p, skb);
>   
>   	if (unlikely(NAPI_GRO_CB(p)->is_flist)) {
>   		flush |= (__force int)(flags ^ tcp_flag_word(th2));

Hmm. One wonders: where is the different between this DDP implementation
and the existing DDP implementation added with 4d288d5767f8 ("[SCSI] 
net: add FCoE offload support through net_device") ?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@...e.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ