lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoDu-h8crLBsxTVCy6D30vgcB6aarjOpdXE+f4kX1NM8_A@mail.gmail.com>
Date: Sat, 21 Jun 2025 00:26:07 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, 
	pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com, 
	maciej.fijalkowski@...el.com, jonathan.lemon@...il.com, sdf@...ichev.me, 
	ast@...nel.org, daniel@...earbox.net, hawk@...nel.org, 
	john.fastabend@...il.com, joe@...a.to, willemdebruijn.kernel@...il.com, 
	bpf@...r.kernel.org, netdev@...r.kernel.org, 
	Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next] net: xsk: update tx queue consumer immdiately
 after transmission

On Fri, Jun 20, 2025 at 11:58 PM Stanislav Fomichev
<stfomichev@...il.com> wrote:
>
> On 06/20, Jason Xing wrote:
> > On Fri, Jun 20, 2025 at 10:10 PM Stanislav Fomichev
> > <stfomichev@...il.com> wrote:
> > >
> > > On 06/19, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@...cent.com>
> > > >
> > > > For afxdp, the return value of sendto() syscall doesn't reflect how many
> > > > descs handled in the kernel. One of use cases is that when user-space
> > > > application tries to know the number of transmitted skbs and then decides
> > > > if it continues to send, say, is it stopped due to max tx budget?
> > > >
> > > > The following formular can be used after sending to learn how many
> > > > skbs/descs the kernel takes care of:
> > > >
> > > >   tx_queue.consumers_before - tx_queue.consumers_after
> > > >
> > > > Prior to the current patch, the consumer of tx queue is not immdiately
> > > > updated at the end of each sendto syscall, which leads the consumer
> > > > value out-of-dated from the perspective of user space. So this patch
> > > > requires store operation to pass the cached value to the shared value
> > > > to handle the problem.
> > > >
> > > > Signed-off-by: Jason Xing <kernelxing@...cent.com>
> > > > ---
> > > >  net/xdp/xsk.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > index 7c47f665e9d1..3288ab2d67b4 100644
> > > > --- a/net/xdp/xsk.c
> > > > +++ b/net/xdp/xsk.c
> > > > @@ -856,6 +856,8 @@ static int __xsk_generic_xmit(struct sock *sk)
> > > >       }
> > > >
> > > >  out:
> > > > +     __xskq_cons_release(xs->tx);
> > > > +
> > > >       if (sent_frame)
> > > >               if (xsk_tx_writeable(xs))
> > > >                       sk->sk_write_space(sk);
> > >
> > > So for the "good" case we are going to write the cons twice? From
> > > xskq_cons_peek_desc and from here? Maybe make this __xskq_cons_release
> > > conditional ('if (err)')?
> >
> > One unlikely exception:
> > xskq_cons_peek_desc()->xskq_cons_read_desc()->xskq_cons_is_valid_desc()->return
> > false;
> > ?
> >
> > There are still two possible 'return false' in xskq_cons_peek_desc()
> > while so far I didn't spot a single one happening.
> >
> > Admittedly, your suggestion covers the majority of normal good ones. I
> > can adjust it as you said.
> >
> > >
> > > I also wonder whether we should add a test for that? Should be easy to
> > > verify by sending more than 32 packets. Is there a place in
> > > tools/testing/selftests/bpf/xskxceiver.c to add that?
> >
> > Well, sorry, if it's not required, please don't force me to do so :S
> > The patch is only one simple update of the consumer that is shared
> > between user-space and kernel.
>
> My suspicion is that the same issue exists for the zc case. So would
> be nice to test it and fix it as well :-p

Oh, well, I will take a look at how the selftest works in the next few days.

Allow me to ask the question that you asked me before: even though I
didn't see the necessity to set the max budget for zc mode (just
because I didn't spot it happening), would it be better if we separate
both of them because it's an uAPI interface. IIUC, if the setsockopt
is set, we will not separate it any more in the future?

Or we can keep using the hardcoded value (32) in the zc mode like
before and __only__ touch the copy mode? Then if someone or I found
the significance of making it tunable, then another parameter of
setsockopt can be added? Does it make sense?

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ