netdev - Re: [PATCH net-next 03/16] libeth: add a couple of XDP helpers (libeth

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cba0216c-c87a-421d-bc4d-bc199165edbd@intel.com>
Date: Tue, 8 Apr 2025 15:22:48 +0200
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
CC: <intel-wired-lan@...ts.osuosl.org>, Michal Kubiak
	<michal.kubiak@...el.com>, Tony Nguyen <anthony.l.nguyen@...el.com>, "Przemek
 Kitszel" <przemyslaw.kitszel@...el.com>, Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, "Alexei
 Starovoitov" <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
	"Jesper Dangaard Brouer" <hawk@...nel.org>, John Fastabend
	<john.fastabend@...il.com>, Simon Horman <horms@...nel.org>,
	<bpf@...r.kernel.org>, <netdev@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next 03/16] libeth: add a couple of XDP helpers
 (libeth_xdp)

From: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Date: Tue, 11 Mar 2025 15:05:38 +0100

> On Wed, Mar 05, 2025 at 05:21:19PM +0100, Alexander Lobakin wrote:
>> "Couple" is a bit humbly... Add the following functionality to libeth:

[...]

>> +struct libeth_rq_napi_stats {
>> +	union {
>> +		struct {
>> +							u32 packets;
>> +							u32 bytes;
>> +							u32 fragments;
>> +							u32 hsplit;
>> +		};
>> +		DECLARE_FLEX_ARRAY(u32, raw);
> 
> The @raw approach is never used throughout the patchset, right?

Right, but

> Could you explain the reason for introducing this and potential use case?

initially, when my tree contained libeth generic stats also, I used this
field to update queue stats in a loop (unrolled by 4 fields) instead of
field-by-field.
Generic stats are still planned, and since ::raw is already present in
&libeth_sq_napi_stats, I'd like to keep it :z

> 
>> +	};
>> +};
>>  
>>  /**
>>   * struct libeth_sq_napi_stats - "hot" counters to update in Tx completion loop
>> @@ -22,4 +44,84 @@ struct libeth_sq_napi_stats {
>>  	};
>>  };
>>  
>> +/**
>> + * struct libeth_xdpsq_napi_stats - "hot" counters to update in XDP Tx
>> + *				    completion loop
>> + * @packets: completed frames counter
>> + * @bytes: sum of bytes of completed frames above
>> + * @fragments: sum of fragments of completed S/G frames
>> + * @raw: alias to access all the fields as an array
>> + */
>> +struct libeth_xdpsq_napi_stats {
> 
> what's the delta between this and libeth_sq_napi_stats ? couldn't you have
> a single struct for purpose of tx napi stats?

Same as previous, future-proof. &libeth_sq{,_napi}_stats will contain
stuff XDP queues will never need and vice versa.

> 
>> +	union {
>> +		struct {
>> +							u32 packets;
>> +							u32 bytes;
>> +							u32 fragments;
>> +		};
>> +		DECLARE_FLEX_ARRAY(u32, raw);
>> +	};
>> +};

[...]

>> @@ -71,7 +84,10 @@ struct libeth_sqe {
>>  /**
>>   * struct libeth_cq_pp - completion queue poll params
>>   * @dev: &device to perform DMA unmapping
>> + * @bq: XDP frame bulk to combine return operations
>>   * @ss: onstack NAPI stats to fill
>> + * @xss: onstack XDPSQ NAPI stats to fill
>> + * @xdp_tx: number of XDP frames processed
>>   * @napi: whether it's called from the NAPI context
>>   *
>>   * libeth uses this structure to access objects needed for performing full
>> @@ -80,7 +96,13 @@ struct libeth_sqe {
>>   */
>>  struct libeth_cq_pp {
>>  	struct device			*dev;
>> -	struct libeth_sq_napi_stats	*ss;
>> +	struct xdp_frame_bulk		*bq;
>> +
>> +	union {
>> +		struct libeth_sq_napi_stats	*ss;
>> +		struct libeth_xdpsq_napi_stats	*xss;
>> +	};
>> +	u32				xdp_tx;
> 
> you have this counted in xss::packets?

Nope, it's the same as in ice, you have separate ::packets AND ::xdp_tx
on the ring to speed up XSk completion when there's no XDP-non-XSk buffers.

> 
>>  
>>  	bool				napi;
>>  };

[...]

>> +/* Common Tx bits */
>> +
>> +/**
>> + * enum - libeth_xdp internal Tx flags
>> + * @LIBETH_XDP_TX_BULK: one bulk size at which it will be flushed to the queue
>> + * @LIBETH_XDP_TX_BATCH: batch size for which the queue fill loop is unrolled
>> + * @LIBETH_XDP_TX_DROP: indicates the send function must drop frames not sent
>> + * @LIBETH_XDP_TX_NDO: whether the send function is called from .ndo_xdp_xmit()
>> + */
>> +enum {
>> +	LIBETH_XDP_TX_BULK		= DEV_MAP_BULK_SIZE,
>> +	LIBETH_XDP_TX_BATCH		= 8,
>> +
>> +	LIBETH_XDP_TX_DROP		= BIT(0),
>> +	LIBETH_XDP_TX_NDO		= BIT(1),
> 
> what's the reason to group these random values onto enum?

They then will be visible in BTFs (not sure anyone will need them there).

> 
>> +};
>> +
>> +/**
>> + * enum - &libeth_xdp_tx_frame and &libeth_xdp_tx_desc flags
>> + * @LIBETH_XDP_TX_LEN: only for ``XDP_TX``, [15:0] of ::len_fl is actual length
>> + * @LIBETH_XDP_TX_FIRST: indicates the frag is the first one of the frame
>> + * @LIBETH_XDP_TX_LAST: whether the frag is the last one of the frame
>> + * @LIBETH_XDP_TX_MULTI: whether the frame contains several frags
> 
> would be good to have some extended description around usage of these
> flags.

They are internal to libeth functions anyway, hence no detailed description.

> 
>> + * @LIBETH_XDP_TX_FLAGS: only for ``XDP_TX``, [31:16] of ::len_fl is flags
>> + */
>> +enum {
>> +	LIBETH_XDP_TX_LEN		= GENMASK(15, 0),
>> +
>> +	LIBETH_XDP_TX_FIRST		= BIT(16),
>> +	LIBETH_XDP_TX_LAST		= BIT(17),
>> +	LIBETH_XDP_TX_MULTI		= BIT(18),
>> +
>> +	LIBETH_XDP_TX_FLAGS		= GENMASK(31, 16),
>> +};

[...]

>> +/**
>> + * libeth_xdp_tx_queue_head - internal helper for queueing one ``XDP_TX`` head
>> + * @bq: XDP Tx bulk to queue the head frag to
>> + * @xdp: XDP buffer with the head to queue
>> + *
>> + * Return: false if it's the only frag of the frame, true if it's an S/G frame.
>> + */
>> +static inline bool libeth_xdp_tx_queue_head(struct libeth_xdp_tx_bulk *bq,
>> +					    const struct libeth_xdp_buff *xdp)
>> +{
>> +	const struct xdp_buff *base = &xdp->base;
>> +
>> +	bq->bulk[bq->count++] = (typeof(*bq->bulk)){
>> +		.data	= xdp->data,
>> +		.len_fl	= (base->data_end - xdp->data) | LIBETH_XDP_TX_FIRST,
>> +		.soff	= xdp_data_hard_end(base) - xdp->data,
>> +	};
>> +
>> +	if (!xdp_buff_has_frags(base))
> 
> likely() ?

With the header split enabled and getting more and more popular -- not
really. likely() hurts perf here actually.

> 
>> +		return false;
>> +
>> +	bq->bulk[bq->count - 1].len_fl |= LIBETH_XDP_TX_MULTI;
>> +
>> +	return true;
>> +}
>> +
>> +/**
>> + * libeth_xdp_tx_queue_frag - internal helper for queueing one ``XDP_TX`` frag
>> + * @bq: XDP Tx bulk to queue the frag to
>> + * @frag: frag to queue
>> + */
>> +static inline void libeth_xdp_tx_queue_frag(struct libeth_xdp_tx_bulk *bq,
>> +					    const skb_frag_t *frag)
>> +{
>> +	bq->bulk[bq->count++].frag = *frag;
> 
> IMHO this helper is not providing anything useful

That's why it's stated "internal helper" :D

> 
>> +}

[...]

>> +#define libeth_xdp_tx_fill_stats(sqe, desc, sinfo)			      \
>> +	__libeth_xdp_tx_fill_stats(sqe, desc, sinfo, __UNIQUE_ID(sqe_),	      \
>> +				   __UNIQUE_ID(desc_), __UNIQUE_ID(sinfo_))
>> +
>> +#define __libeth_xdp_tx_fill_stats(sqe, desc, sinfo, ue, ud, us) do {	      \
>> +	const struct libeth_xdp_tx_desc *ud = (desc);			      \
>> +	const struct skb_shared_info *us;				      \
>> +	struct libeth_sqe *ue = (sqe);					      \
>> +									      \
>> +	ue->nr_frags = 1;						      \
>> +	ue->bytes = ud->len;						      \
>> +									      \
>> +	if (ud->flags & LIBETH_XDP_TX_MULTI) {				      \
>> +		us = (sinfo);						      \
> 
> why? what 'u' stands for? ue us don't tell the reader much from the first
> glance. sinfo tells me everything.

ue -- "unique element"
ud -- "unique descriptor"
us -- "unique sinfo"

All of them are purely internal to pass __UNIQUE_ID() in the
non-underscored version to avoid variable shadowing.

> 
>> +		ue->nr_frags += us->nr_frags;				      \
>> +		ue->bytes += us->xdp_frags_size;			      \
>> +	}								      \
>> +} while (0)

[...]

>> +/**
>> + * libeth_xdp_xmit_do_bulk - implement full .ndo_xdp_xmit() in driver
>> + * @dev: target &net_device
>> + * @n: number of frames to send
>> + * @fr: XDP frames to send
>> + * @f: flags passed by the stack
>> + * @xqs: array of XDPSQs driver structs
>> + * @nqs: number of active XDPSQs, the above array length
>> + * @fl: driver callback to flush an XDP xmit bulk
>> + * @fin: driver cabback to finalize the queue
>> + *
>> + * If the driver has active XDPSQs, perform common checks and send the frames.
>> + * Finalize the queue, if requested.
>> + *
>> + * Return: number of frames sent or -errno on error.
>> + */
>> +#define libeth_xdp_xmit_do_bulk(dev, n, fr, f, xqs, nqs, fl, fin)	      \
>> +	_libeth_xdp_xmit_do_bulk(dev, n, fr, f, xqs, nqs, fl, fin,	      \
>> +				 __UNIQUE_ID(bq_), __UNIQUE_ID(ret_),	      \
>> +				 __UNIQUE_ID(nqs_))
> 
> why __UNIQUE_ID() is needed?

As above, variable shadowing.

> 
>> +
>> +#define _libeth_xdp_xmit_do_bulk(d, n, fr, f, xqs, nqs, fl, fin, ub, ur, un)  \
> 
> why single underscore? usually we do __ for internal funcs as you did
> somewhere above.

Double-underscored is defined above already :D
So it would be either like this or __ + ___

> 
> also, why define and not inlined func?

I'll double check, but if you look at its usage in idpf/xdp.c, you'll
see that some arguments are non-trivial to obtain, IOW they cost some
cycles. Macro ensures they won't be fetched prior to
`likely(number_of_xdpsqs)`.
I'll convert to an inline and check if the compiler handles this itself.
It didn't behave in {,__}libeth_xdp_tx_fill_stats() unfortunately, hence
macro there as well =\

> 
>> +({									      \
>> +	u32 un = (nqs);							      \
>> +	int ur;								      \
>> +									      \
>> +	if (likely(un)) {						      \
>> +		struct libeth_xdp_tx_bulk ub;				      \
>> +									      \
>> +		libeth_xdp_xmit_init_bulk(&ub, d, xqs, un);		      \
>> +		ur = __libeth_xdp_xmit_do_bulk(&ub, fr, n, f, fl, fin);	      \
>> +	} else {							      \
>> +		ur = -ENXIO;						      \
>> +	}								      \
>> +									      \
>> +	ur;								      \
>> +})

[...]

>> +static inline void
>> +libeth_xdp_init_buff(struct libeth_xdp_buff *dst,
>> +		     const struct libeth_xdp_buff_stash *src,
>> +		     struct xdp_rxq_info *rxq)
> 
> what is the rationale for storing/loading xdp_buff onto/from driver's Rx
> queue? could we work directly on xdp_buff from Rx queue? ice is doing so
> currently.

Stack vs heap. I was getting lower numbers working on the queue directly.
Also note that &libeth_xdp_buff_stash is 16 bytes, while
&libeth_xdp_buff is 64. I don't think it makes sense to waste +48
bytes in the structure.

Load-store of the stash is rare anyway. It can happen *only* if the HW
for some reason hasn't written the whole multi-buffer frame yet, since
NAPI budget is counted by packets, not fragments.

> 
>> +{
>> +	if (likely(!src->data))
>> +		dst->data = NULL;
>> +	else
>> +		libeth_xdp_load_stash(dst, src);
>> +
>> +	dst->base.rxq = rxq;
>> +}

[...]

>> +static inline bool libeth_xdp_process_buff(struct libeth_xdp_buff *xdp,
>> +					   const struct libeth_fqe *fqe,
>> +					   u32 len)
>> +{
>> +	if (!libeth_rx_sync_for_cpu(fqe, len))
>> +		return false;
>> +
>> +	if (xdp->data)
> 
> unlikely() ?

Same as for libeth_xdp_tx_queue_head(): with the header split, you'll
hit this branch every frame.

> 
>> +		return libeth_xdp_buff_add_frag(xdp, fqe, len);
>> +
>> +	libeth_xdp_prepare_buff(xdp, fqe, len);
>> +
>> +	prefetch(xdp->data);
>> +
>> +	return true;
>> +}

[...]

>> +/**
>> + * libeth_xdp_run_prog - run XDP program and handle all verdicts
>> + * @xdp: XDP buffer to process
>> + * @bq: XDP Tx bulk to queue ``XDP_TX`` buffers
>> + * @fl: driver ``XDP_TX`` bulk flush callback
>> + *
>> + * Run the attached XDP program and handle all possible verdicts.
>> + * Prefer using it via LIBETH_XDP_DEFINE_RUN{,_PASS,_PROG}().
>> + *
>> + * Return: true if the buffer should be passed up the stack, false if the poll
>> + * should go to the next buffer.
>> + */
>> +#define libeth_xdp_run_prog(xdp, bq, fl)				      \
> 
> is this used in idpf in this patchset?

Sure. __LIBETH_XDP_DEFINE_RUN() builds two functions, one of which uses it.
Same for __LIBETH_XDP_DEFINE_RUN_PROG(). I know they are poor to read,
but otherwise I'd need to duplicate them for XDP and XSk separately.

> 
>> +	(__libeth_xdp_run_flush(xdp, bq, __libeth_xdp_run_prog,		      \
>> +				libeth_xdp_tx_queue_bulk,		      \
>> +				fl) == LIBETH_XDP_PASS)
>> +
>> +/**
>> + * __libeth_xdp_run_pass - helper to run XDP program and handle the result
>> + * @xdp: XDP buffer to process
>> + * @bq: XDP Tx bulk to queue ``XDP_TX`` frames
>> + * @napi: NAPI to build an skb and pass it up the stack
>> + * @rs: onstack libeth RQ stats
>> + * @md: metadata that should be filled to the XDP buffer
>> + * @prep: callback for filling the metadata
>> + * @run: driver wrapper to run XDP program
> 
> I see it's NULLed on idpf? why have this?

Only for singleq which we don't support. splitq uses
LIBETH_XDP_DEFINE_RUN() to build idpf_xdp_run_prog() and
idpf_xdp_run_pass().

> 
>> + * @populate: driver callback to populate an skb with the HW descriptor data
>> + *
>> + * Inline abstraction that does the following:
>> + * 1) adds frame size and frag number (if needed) to the onstack stats;
>> + * 2) fills the descriptor metadata to the onstack &libeth_xdp_buff
>> + * 3) runs XDP program if present;
>> + * 4) handles all possible verdicts;
>> + * 5) on ``XDP_PASS`, builds an skb from the buffer;
>> + * 6) populates it with the descriptor metadata;
>> + * 7) passes it up the stack.

[...]

>> +void __cold libeth_xdp_tx_exception(struct libeth_xdp_tx_bulk *bq, u32 sent,
>> +				    u32 flags)
>> +{
>> +	const struct libeth_xdp_tx_frame *pos = &bq->bulk[sent];
>> +	u32 left = bq->count - sent;
>> +
>> +	if (!(flags & LIBETH_XDP_TX_NDO))
>> +		libeth_trace_xdp_exception(bq->dev, bq->prog, XDP_TX);
>> +
>> +	if (!(flags & LIBETH_XDP_TX_DROP)) {
>> +		memmove(bq->bulk, pos, left * sizeof(*bq->bulk));
> 
> can this overflow? if queue got stuck for some reason.

memmove() is safe to call even when src == dst. As for XDP Tx logic, if
the queue is stuck, the bulk will never overflow, libeth will just try
send it again and again. At the end, both XDP Tx and xmit calls it with
DROP to make sure no memleaks or other issues can take place.

> 
>> +		bq->count = left;
>> +
>> +		return;
>> +	}
>> +
>> +	if (!(flags & LIBETH_XDP_TX_NDO))
>> +		libeth_xdp_tx_return_bulk(pos, left);
>> +	else
>> +		libeth_xdp_xmit_return_bulk(pos, left, bq->dev);
>> +
>> +	bq->count = 0;
>> +}
>> +EXPORT_SYMBOL_GPL(libeth_xdp_tx_exception);

Thanks,
Olek