lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoBub4JpHrgWekK+OVCb0frXUaFYDGVd2XL3bvjHOTmFjQ@mail.gmail.com>
Date: Tue, 24 Jun 2025 07:54:21 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net, edumazet@...gle.com, 
	pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com, 
	maciej.fijalkowski@...el.com, jonathan.lemon@...il.com, sdf@...ichev.me, 
	ast@...nel.org, daniel@...earbox.net, hawk@...nel.org, 
	john.fastabend@...il.com, joe@...a.to, willemdebruijn.kernel@...il.com, 
	bpf@...r.kernel.org, netdev@...r.kernel.org, 
	Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v3] net: xsk: introduce XDP_MAX_TX_BUDGET set/getsockopt

On Mon, Jun 23, 2025 at 10:18 PM Stanislav Fomichev
<stfomichev@...il.com> wrote:
>
> On 06/21, Jason Xing wrote:
> > On Sat, Jun 21, 2025 at 12:47 AM Stanislav Fomichev
> > <stfomichev@...il.com> wrote:
> > >
> > > On 06/21, Jason Xing wrote:
> > > > On Fri, Jun 20, 2025 at 10:25 PM Stanislav Fomichev
> > > > <stfomichev@...il.com> wrote:
> > > > >
> > > > > On 06/19, Jakub Kicinski wrote:
> > > > > > On Thu, 19 Jun 2025 17:04:40 +0800 Jason Xing wrote:
> > > > > > > @@ -424,7 +421,9 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
> > > > > > >     rcu_read_lock();
> > > > > > >  again:
> > > > > > >     list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
> > > > > > > -           if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
> > > > > > > +           int max_budget = READ_ONCE(xs->max_tx_budget);
> > > > > > > +
> > > > > > > +           if (xs->tx_budget_spent >= max_budget) {
> > > > > > >                     budget_exhausted = true;
> > > > > > >                     continue;
> > > > > > >             }
> > > > > > > @@ -779,7 +778,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > > > > > >  static int __xsk_generic_xmit(struct sock *sk)
> > > > > > >  {
> > > > > > >     struct xdp_sock *xs = xdp_sk(sk);
> > > > > > > -   u32 max_batch = TX_BATCH_SIZE;
> > > > > > > +   u32 max_budget = READ_ONCE(xs->max_tx_budget);
> > > > > >
> > > > > > Hm, maybe a question to Stan / Willem & other XSK experts but are these
> > > > > > two max values / code paths really related? Question 2 -- is generic
> > > > > > XSK a legit optimization target, legit enough to add uAPI?
> > > > >
> > > > > 1) xsk_tx_peek_desc is for zc case and xsk_build_skb is copy mode;
> > > > > whether we want to affect zc case given the fact that Jason seemingly
> > > > > cares about copy mode is a good question.
> > > >
> > > > Allow me to ask the similar question that you asked me before: even though I
> > > > didn't see the necessity to set the max budget for zc mode (just
> > > > because I didn't spot it happening), would it be better if we separate
> > > > both of them because it's an uAPI interface. IIUC, if the setsockopt
> > > > is set, we will not separate it any more in the future?
> > > >
> > > > We can keep using the hardcoded value (32) in the zc mode like
> > > > before and __only__ touch the copy mode? Later if someone or I found
> > > > the significance of making it tunable, then another parameter of
> > > > setsockopt can be added? Does it make sense?
> > >
> > > Related suggestion: maybe we don't need this limit at all for the copy mode?
> > > If the user, with a socket option, can arbitrarily change it, what is the
> > > point of this limit? Keep it on the zc side to make sure one socket doesn't
> > > starve the rest and drop from the copy mode.. Any reason not to do it?
> >
> > Thanks for bringing up the same question that I had in this thread. I
> > saw the commit[1] mentioned it is used to avoid the burst as DPDK
> > does, so my thought is that it might be used to prevent such a case
> > where multiple sockets try to send packets through a shared umem
> > nearly at the same time?
> >
> > Making it tunable is to provide a chance to let users seek for a good
> > solution that is the best fit for them. It doesn't mean we
> > allow/expect to see the burst situation.
>
> The users can choose to moderate their batches by submitting less
> with each sendmsg call. I see why having a batch limit might be useful for
> zerocopy to tx in batches to interleave multiple sockets, but not
> sure how this limit helps for the copy mode. Since we are not running
> qdisc layer on tx, we don't really have a good answer for multiple
> sockets sharing the same device/queue..

It's worth mentioning that the xsk still holds the tx queue lock in
the non-zc mode. So I assume getting rid of the limit might be harmful
for other non xsk flows. That is what I know about the burst concern.

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ