[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoAtJJ4ZO464UaeLUi8jt1RQr7Lg7fk6vdKPPe6fdw_gZQ@mail.gmail.com>
Date: Wed, 18 Jun 2025 14:59:10 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com,
maciej.fijalkowski@...el.com, jonathan.lemon@...il.com, sdf@...ichev.me,
ast@...nel.org, daniel@...earbox.net, hawk@...nel.org,
john.fastabend@...il.com, bpf@...r.kernel.org, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs
On Wed, Jun 18, 2025 at 8:29 AM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Wed, Jun 18, 2025 at 1:46 AM Stanislav Fomichev <stfomichev@...il.com> wrote:
> >
> > On 06/17, Jason Xing wrote:
> > > Hi Stanislav,
> > >
> > > On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@...il.com> wrote:
> > > >
> > > > On 06/17, Jason Xing wrote:
> > > > > From: Jason Xing <kernelxing@...cent.com>
> > > > >
> > > > > Introduce a control method in the xsk path to let users have the chance
> > > > > to tune it manually.
> > > >
> > > > Can you expand more on why the defaults don't work for you?
> > >
> > > We use a user-level tcp stack with xsk to transmit packets that have
> > > higher priorities than other normal kernel tcp flows. It turns out
> > > that enlarging the number can minimize times of triggering sendto
> > > sysctl, which contributes to faster transmission. it's very easy to
> > > hit the upper bound (namely, 32) if you log the return value of
> > > sendto. I mentioned a bit about this in the second patch, saying that
> > > we can have a similar knob already appearing in the qdisc layer.
> > > Furthermore, exposing important parameters can help applications
> > > complete their AI/auto-tuning to judge which one is the best fit in
> > > their production workload. That is also one of the promising
> > > tendencies :)
> > >
> > > >
> > > > Also, can we put these settings into the socket instead of (global/ns)
> > > > sysctl?
> > >
> > > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its
> > > corresponding netns? I have no strong opinion on this point for now.
>
> To add to that, after digging into this part, I realized that we're
> able to use sock_net(sk)->core.max_tx_budget directly to finish the
> namespace stuff because xsk_create() calls sk_alloc() which correlates
> its netns in the sk->sk_net. Sounds reasonable?
Updated patch here:
https://lore.kernel.org/all/20250618065553.96822-1-kerneljasonxing@gmail.com/
Please review :0
Thanks,
Jason
>
> >
> > I'm suggesting something along these lines (see below). And then add
> > some way to configure it (plus, obviously, set the default value
> > on init). There is also a question on whether you need separate
> > values for MAX_PER_SOCKET_BUDGET and TX_BATCH_SIZE, and if yes,
>
> For now, actually I don't see a specific reason to separate them, so
> let me use a single one in V2. My use case only expects to see the
> TX_BATCH_SIZE adjustment.
>
> > then why.
> >
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 72c000c0ae5f..fb2caec9914d 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -424,7 +424,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
> > rcu_read_lock();
> > again:
> > list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
> > - if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
> > + if (xs->tx_budget_spent >= xs->max_tx_budget) {
>
> If we implement it like this, xs->max_tx_budget has to read a
> per-netns from somewhere and then initialize it. The core problem
> still remains: where to store the per netns value.
>
> Do you think using the aforementioned sock_net(sk)->core.max_tx_budget
> is possible?
>
> Thanks,
> Jason
>
> > budget_exhausted = true;
> > continue;
> > }
> > @@ -779,7 +779,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> > static int __xsk_generic_xmit(struct sock *sk)
> > {
> > struct xdp_sock *xs = xdp_sk(sk);
> > - u32 max_batch = TX_BATCH_SIZE;
> > bool sent_frame = false;
> > struct xdp_desc desc;
> > struct sk_buff *skb;
> > @@ -797,7 +796,7 @@ static int __xsk_generic_xmit(struct sock *sk)
> > goto out;
> >
> > while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> > - if (max_batch-- == 0) {
> > + if (xs->max_tx_budget-- == 0) {
> > err = -EAGAIN;
> > goto out;
> > }
Powered by blists - more mailing lists