netdev - Re: [PATCH net-next v3 1/2] net: xsk: update tx queue consumer immediately after transmission

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAL+tcoBfg-HfMxYHTnP6xb0ZWp68KiP4R0U-AdUt9UE=UJYCkw@mail.gmail.com>
Date: Wed, 25 Jun 2025 20:49:38 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, 
	pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com, 
	jonathan.lemon@...il.com, sdf@...ichev.me, ast@...nel.org, 
	daniel@...earbox.net, hawk@...nel.org, john.fastabend@...il.com, joe@...a.to, 
	willemdebruijn.kernel@...il.com, bpf@...r.kernel.org, netdev@...r.kernel.org, 
	Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v3 1/2] net: xsk: update tx queue consumer
 immediately after transmission

Hi Maciej,

On Wed, Jun 25, 2025 at 7:09 PM Maciej Fijalkowski
<maciej.fijalkowski@...el.com> wrote:
>
> On Wed, Jun 25, 2025 at 06:10:13PM +0800, Jason Xing wrote:
> > From: Jason Xing <kernelxing@...cent.com>
> >
> > For afxdp, the return value of sendto() syscall doesn't reflect how many
> > descs handled in the kernel. One of use cases is that when user-space
> > application tries to know the number of transmitted skbs and then decides
> > if it continues to send, say, is it stopped due to max tx budget?
> >
> > The following formular can be used after sending to learn how many
> > skbs/descs the kernel takes care of:
> >
> >   tx_queue.consumers_before - tx_queue.consumers_after
> >
> > Prior to the current patch, in non-zc mode, the consumer of tx queue is
> > not immediately updated at the end of each sendto syscall when error
> > occurs, which leads to the consumer value out-of-dated from the perspective
> > of user space. So this patch requires store operation to pass the cached
> > value to the shared value to handle the problem.
> >
> > More than those explicit errors appearing in the while() loop in
> > __xsk_generic_xmit(), there are a few possible error cases that might
> > be neglected in the following call trace:
> > __xsk_generic_xmit()
> >     xskq_cons_peek_desc()
> >         xskq_cons_read_desc()
> >           xskq_cons_is_valid_desc()
> > It will also cause the premature exit in the while() loop even if not
> > all the descs are consumed.
> >
> > Based on the above analysis, using 'cached_prod != cached_cons' could
> > cover all the possible cases because it represents there are remaining
> > descs that are not handled and cached_cons are not updated to the global
> > state of consumer at this time.
> >
> > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > ---
> > v3
> > Link: https://lore.kernel.org/all/20250623073129.23290-1-kerneljasonxing@gmail.com/
> > 1. use xskq_has_descs helper.
> > 2. add selftest
> >
> > V2
> > Link: https://lore.kernel.org/all/20250619093641.70700-1-kerneljasonxing@gmail.com/
> > 1. filter out those good cases because only those that return error need
> > updates.
> > Side note:
> > 1. in non-batched zero copy mode, at the end of every caller of
> > xsk_tx_peek_desc(), there is always a xsk_tx_release() function that used
> > to update the local consumer to the global state of consumer. So for the
> > zero copy mode, no need to change at all.
> > 2. Actually I have no strong preference between v1 (see the above link)
> > and v2 because smp_store_release() shouldn't cause side effect.
> > Considering the exactitude of writing code, v2 is a more preferable
> > one.
> > ---
> >  net/xdp/xsk.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 5542675dffa9..ab6351b24ac8 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -856,6 +856,9 @@ static int __xsk_generic_xmit(struct sock *sk)
> >       }
> >
> >  out:
> > +     if (xskq_has_descs(xs->tx))
> > +             __xskq_cons_release(xs->tx);
> > +
> >       if (sent_frame)
> >               if (xsk_tx_writeable(xs))
> >                       sk->sk_write_space(sk);
>
> Hi Jason,
> IMHO below should be enough to address the issue:

Sure, it can.

Can I ask one more thing? Technically it's not considered a bug,
right? I'm not sure if it's worth telling the stable team to backport
in older versions.

>
>         if (sent_frame) {

Using this condition means the consumer is updated in majority cases
including those good cases [1]. The intention of the current patch is
to update the consumer only when the error occurs because in other
cases xskq_cons_peek_desc() does it.

[1]: https://lore.kernel.org/all/aFVr60tw3QJopcOo@mini-arch/

>                 __xskq_cons_release(xs->tx);
>                 if (xsk_tx_writeable(xs))
>                         sk->sk_write_space(sk);
>         }
>
> which basically is what xsk_tx_release() does for each tx socket in list.
> zc drivers call it whenever there was a single descriptor produced to HW
> ring. So should we on generic xmit side, based on @sent_frame.

As you said, they would be the same :)

>
> We could even wrap these 3 lines onto internal function, say
> __xsk_tx_release() and use it in xsk_tx_release() as well.

I can do it in the next respin.

But I have no obvious opinion on how to write it. If no one is opposed
to the taste of patch, I will follow your advice. Thanks.

Thanks,
Jason

>
> > --
> > 2.41.3
> >