[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoB-5Gt1_sJ_9-EjH5Nm_Ri+8+3QqFvapnLLpC5y4HW63g@mail.gmail.com>
Date: Sat, 21 Jun 2025 01:46:29 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net, edumazet@...gle.com,
pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com,
maciej.fijalkowski@...el.com, jonathan.lemon@...il.com, sdf@...ichev.me,
ast@...nel.org, daniel@...earbox.net, hawk@...nel.org,
john.fastabend@...il.com, joe@...a.to, willemdebruijn.kernel@...il.com,
bpf@...r.kernel.org, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v3] net: xsk: introduce XDP_MAX_TX_BUDGET set/getsockopt
On Sat, Jun 21, 2025 at 12:47 AM Stanislav Fomichev
<stfomichev@...il.com> wrote:
>
> On 06/21, Jason Xing wrote:
> > On Fri, Jun 20, 2025 at 10:25 PM Stanislav Fomichev
> > <stfomichev@...il.com> wrote:
> > >
> > > On 06/19, Jakub Kicinski wrote:
> > > > On Thu, 19 Jun 2025 17:04:40 +0800 Jason Xing wrote:
> > > > > @@ -424,7 +421,9 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
> > > > > rcu_read_lock();
> > > > > again:
> > > > > list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
> > > > > - if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
> > > > > + int max_budget = READ_ONCE(xs->max_tx_budget);
> > > > > +
> > > > > + if (xs->tx_budget_spent >= max_budget) {
> > > > > budget_exhausted = true;
> > > > > continue;
> > > > > }
> > > > > @@ -779,7 +778,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > > > > static int __xsk_generic_xmit(struct sock *sk)
> > > > > {
> > > > > struct xdp_sock *xs = xdp_sk(sk);
> > > > > - u32 max_batch = TX_BATCH_SIZE;
> > > > > + u32 max_budget = READ_ONCE(xs->max_tx_budget);
> > > >
> > > > Hm, maybe a question to Stan / Willem & other XSK experts but are these
> > > > two max values / code paths really related? Question 2 -- is generic
> > > > XSK a legit optimization target, legit enough to add uAPI?
> > >
> > > 1) xsk_tx_peek_desc is for zc case and xsk_build_skb is copy mode;
> > > whether we want to affect zc case given the fact that Jason seemingly
> > > cares about copy mode is a good question.
> >
> > Allow me to ask the similar question that you asked me before: even though I
> > didn't see the necessity to set the max budget for zc mode (just
> > because I didn't spot it happening), would it be better if we separate
> > both of them because it's an uAPI interface. IIUC, if the setsockopt
> > is set, we will not separate it any more in the future?
> >
> > We can keep using the hardcoded value (32) in the zc mode like
> > before and __only__ touch the copy mode? Later if someone or I found
> > the significance of making it tunable, then another parameter of
> > setsockopt can be added? Does it make sense?
>
> Related suggestion: maybe we don't need this limit at all for the copy mode?
> If the user, with a socket option, can arbitrarily change it, what is the
> point of this limit? Keep it on the zc side to make sure one socket doesn't
> starve the rest and drop from the copy mode.. Any reason not to do it?
Thanks for bringing up the same question that I had in this thread. I
saw the commit[1] mentioned it is used to avoid the burst as DPDK
does, so my thought is that it might be used to prevent such a case
where multiple sockets try to send packets through a shared umem
nearly at the same time?
Making it tunable is to provide a chance to let users seek for a good
solution that is the best fit for them. It doesn't mean we
allow/expect to see the burst situation.
[1]
commit e7a1c1300891d8f11d05b42665e299cc22a4b383
Author: Li RongQing <lirongqing@...du.com>
Date: Wed Apr 14 13:39:12 2021 +0800
xsk: Align XDP socket batch size with DPDK
DPDK default burst size is 32, however, kernel xsk sendto
syscall can not handle all 32 at one time, and return with
error.
So make kernel XDP socket batch size larger to avoid
unnecessary syscall fail and context switch which will help
to increase performance.
Thanks,
Jason
Powered by blists - more mailing lists