[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoBub4JpHrgWekK+OVCb0frXUaFYDGVd2XL3bvjHOTmFjQ@mail.gmail.com>
Date: Tue, 24 Jun 2025 07:54:21 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: Jakub Kicinski <kuba@...nel.org>, davem@...emloft.net, edumazet@...gle.com,
pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com,
maciej.fijalkowski@...el.com, jonathan.lemon@...il.com, sdf@...ichev.me,
ast@...nel.org, daniel@...earbox.net, hawk@...nel.org,
john.fastabend@...il.com, joe@...a.to, willemdebruijn.kernel@...il.com,
bpf@...r.kernel.org, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next v3] net: xsk: introduce XDP_MAX_TX_BUDGET set/getsockopt
On Mon, Jun 23, 2025 at 10:18 PM Stanislav Fomichev
<stfomichev@...il.com> wrote:
>
> On 06/21, Jason Xing wrote:
> > On Sat, Jun 21, 2025 at 12:47 AM Stanislav Fomichev
> > <stfomichev@...il.com> wrote:
> > >
> > > On 06/21, Jason Xing wrote:
> > > > On Fri, Jun 20, 2025 at 10:25 PM Stanislav Fomichev
> > > > <stfomichev@...il.com> wrote:
> > > > >
> > > > > On 06/19, Jakub Kicinski wrote:
> > > > > > On Thu, 19 Jun 2025 17:04:40 +0800 Jason Xing wrote:
> > > > > > > @@ -424,7 +421,9 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
> > > > > > > rcu_read_lock();
> > > > > > > again:
> > > > > > > list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
> > > > > > > - if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
> > > > > > > + int max_budget = READ_ONCE(xs->max_tx_budget);
> > > > > > > +
> > > > > > > + if (xs->tx_budget_spent >= max_budget) {
> > > > > > > budget_exhausted = true;
> > > > > > > continue;
> > > > > > > }
> > > > > > > @@ -779,7 +778,7 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > > > > > > static int __xsk_generic_xmit(struct sock *sk)
> > > > > > > {
> > > > > > > struct xdp_sock *xs = xdp_sk(sk);
> > > > > > > - u32 max_batch = TX_BATCH_SIZE;
> > > > > > > + u32 max_budget = READ_ONCE(xs->max_tx_budget);
> > > > > >
> > > > > > Hm, maybe a question to Stan / Willem & other XSK experts but are these
> > > > > > two max values / code paths really related? Question 2 -- is generic
> > > > > > XSK a legit optimization target, legit enough to add uAPI?
> > > > >
> > > > > 1) xsk_tx_peek_desc is for zc case and xsk_build_skb is copy mode;
> > > > > whether we want to affect zc case given the fact that Jason seemingly
> > > > > cares about copy mode is a good question.
> > > >
> > > > Allow me to ask the similar question that you asked me before: even though I
> > > > didn't see the necessity to set the max budget for zc mode (just
> > > > because I didn't spot it happening), would it be better if we separate
> > > > both of them because it's an uAPI interface. IIUC, if the setsockopt
> > > > is set, we will not separate it any more in the future?
> > > >
> > > > We can keep using the hardcoded value (32) in the zc mode like
> > > > before and __only__ touch the copy mode? Later if someone or I found
> > > > the significance of making it tunable, then another parameter of
> > > > setsockopt can be added? Does it make sense?
> > >
> > > Related suggestion: maybe we don't need this limit at all for the copy mode?
> > > If the user, with a socket option, can arbitrarily change it, what is the
> > > point of this limit? Keep it on the zc side to make sure one socket doesn't
> > > starve the rest and drop from the copy mode.. Any reason not to do it?
> >
> > Thanks for bringing up the same question that I had in this thread. I
> > saw the commit[1] mentioned it is used to avoid the burst as DPDK
> > does, so my thought is that it might be used to prevent such a case
> > where multiple sockets try to send packets through a shared umem
> > nearly at the same time?
> >
> > Making it tunable is to provide a chance to let users seek for a good
> > solution that is the best fit for them. It doesn't mean we
> > allow/expect to see the burst situation.
>
> The users can choose to moderate their batches by submitting less
> with each sendmsg call. I see why having a batch limit might be useful for
> zerocopy to tx in batches to interleave multiple sockets, but not
> sure how this limit helps for the copy mode. Since we are not running
> qdisc layer on tx, we don't really have a good answer for multiple
> sockets sharing the same device/queue..
It's worth mentioning that the xsk still holds the tx queue lock in
the non-zc mode. So I assume getting rid of the limit might be harmful
for other non xsk flows. That is what I know about the burst concern.
Thanks,
Jason
Powered by blists - more mailing lists