[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoDu-h8crLBsxTVCy6D30vgcB6aarjOpdXE+f4kX1NM8_A@mail.gmail.com>
Date: Sat, 21 Jun 2025 00:26:07 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com,
maciej.fijalkowski@...el.com, jonathan.lemon@...il.com, sdf@...ichev.me,
ast@...nel.org, daniel@...earbox.net, hawk@...nel.org,
john.fastabend@...il.com, joe@...a.to, willemdebruijn.kernel@...il.com,
bpf@...r.kernel.org, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next] net: xsk: update tx queue consumer immdiately
after transmission
On Fri, Jun 20, 2025 at 11:58 PM Stanislav Fomichev
<stfomichev@...il.com> wrote:
>
> On 06/20, Jason Xing wrote:
> > On Fri, Jun 20, 2025 at 10:10 PM Stanislav Fomichev
> > <stfomichev@...il.com> wrote:
> > >
> > > On 06/19, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@...cent.com>
> > > >
> > > > For afxdp, the return value of sendto() syscall doesn't reflect how many
> > > > descs handled in the kernel. One of use cases is that when user-space
> > > > application tries to know the number of transmitted skbs and then decides
> > > > if it continues to send, say, is it stopped due to max tx budget?
> > > >
> > > > The following formular can be used after sending to learn how many
> > > > skbs/descs the kernel takes care of:
> > > >
> > > > tx_queue.consumers_before - tx_queue.consumers_after
> > > >
> > > > Prior to the current patch, the consumer of tx queue is not immdiately
> > > > updated at the end of each sendto syscall, which leads the consumer
> > > > value out-of-dated from the perspective of user space. So this patch
> > > > requires store operation to pass the cached value to the shared value
> > > > to handle the problem.
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > > > ---
> > > > net/xdp/xsk.c | 2 ++
> > > > 1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > index 7c47f665e9d1..3288ab2d67b4 100644
> > > > --- a/net/xdp/xsk.c
> > > > +++ b/net/xdp/xsk.c
> > > > @@ -856,6 +856,8 @@ static int __xsk_generic_xmit(struct sock *sk)
> > > > }
> > > >
> > > > out:
> > > > + __xskq_cons_release(xs->tx);
> > > > +
> > > > if (sent_frame)
> > > > if (xsk_tx_writeable(xs))
> > > > sk->sk_write_space(sk);
> > >
> > > So for the "good" case we are going to write the cons twice? From
> > > xskq_cons_peek_desc and from here? Maybe make this __xskq_cons_release
> > > conditional ('if (err)')?
> >
> > One unlikely exception:
> > xskq_cons_peek_desc()->xskq_cons_read_desc()->xskq_cons_is_valid_desc()->return
> > false;
> > ?
> >
> > There are still two possible 'return false' in xskq_cons_peek_desc()
> > while so far I didn't spot a single one happening.
> >
> > Admittedly, your suggestion covers the majority of normal good ones. I
> > can adjust it as you said.
> >
> > >
> > > I also wonder whether we should add a test for that? Should be easy to
> > > verify by sending more than 32 packets. Is there a place in
> > > tools/testing/selftests/bpf/xskxceiver.c to add that?
> >
> > Well, sorry, if it's not required, please don't force me to do so :S
> > The patch is only one simple update of the consumer that is shared
> > between user-space and kernel.
>
> My suspicion is that the same issue exists for the zc case. So would
> be nice to test it and fix it as well :-p
Oh, well, I will take a look at how the selftest works in the next few days.
Allow me to ask the question that you asked me before: even though I
didn't see the necessity to set the max budget for zc mode (just
because I didn't spot it happening), would it be better if we separate
both of them because it's an uAPI interface. IIUC, if the setsockopt
is set, we will not separate it any more in the future?
Or we can keep using the hardcoded value (32) in the zc mode like
before and __only__ touch the copy mode? Then if someone or I found
the significance of making it tunable, then another parameter of
setsockopt can be added? Does it make sense?
Thanks,
Jason
Powered by blists - more mailing lists