[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoAQ8xVXRmnjafgGWYDWy_FYuA=P4_Tzmh1zUkna2BF+nA@mail.gmail.com>
Date: Wed, 18 Jun 2025 08:29:37 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com,
maciej.fijalkowski@...el.com, jonathan.lemon@...il.com, sdf@...ichev.me,
ast@...nel.org, daniel@...earbox.net, hawk@...nel.org,
john.fastabend@...il.com, bpf@...r.kernel.org, netdev@...r.kernel.org,
Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs
On Wed, Jun 18, 2025 at 1:46 AM Stanislav Fomichev <stfomichev@...il.com> wrote:
>
> On 06/17, Jason Xing wrote:
> > Hi Stanislav,
> >
> > On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@...il.com> wrote:
> > >
> > > On 06/17, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@...cent.com>
> > > >
> > > > Introduce a control method in the xsk path to let users have the chance
> > > > to tune it manually.
> > >
> > > Can you expand more on why the defaults don't work for you?
> >
> > We use a user-level tcp stack with xsk to transmit packets that have
> > higher priorities than other normal kernel tcp flows. It turns out
> > that enlarging the number can minimize times of triggering sendto
> > sysctl, which contributes to faster transmission. it's very easy to
> > hit the upper bound (namely, 32) if you log the return value of
> > sendto. I mentioned a bit about this in the second patch, saying that
> > we can have a similar knob already appearing in the qdisc layer.
> > Furthermore, exposing important parameters can help applications
> > complete their AI/auto-tuning to judge which one is the best fit in
> > their production workload. That is also one of the promising
> > tendencies :)
> >
> > >
> > > Also, can we put these settings into the socket instead of (global/ns)
> > > sysctl?
> >
> > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its
> > corresponding netns? I have no strong opinion on this point for now.
To add to that, after digging into this part, I realized that we're
able to use sock_net(sk)->core.max_tx_budget directly to finish the
namespace stuff because xsk_create() calls sk_alloc() which correlates
its netns in the sk->sk_net. Sounds reasonable?
>
> I'm suggesting something along these lines (see below). And then add
> some way to configure it (plus, obviously, set the default value
> on init). There is also a question on whether you need separate
> values for MAX_PER_SOCKET_BUDGET and TX_BATCH_SIZE, and if yes,
For now, actually I don't see a specific reason to separate them, so
let me use a single one in V2. My use case only expects to see the
TX_BATCH_SIZE adjustment.
> then why.
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 72c000c0ae5f..fb2caec9914d 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -424,7 +424,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
> rcu_read_lock();
> again:
> list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
> - if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
> + if (xs->tx_budget_spent >= xs->max_tx_budget) {
If we implement it like this, xs->max_tx_budget has to read a
per-netns from somewhere and then initialize it. The core problem
still remains: where to store the per netns value.
Do you think using the aforementioned sock_net(sk)->core.max_tx_budget
is possible?
Thanks,
Jason
> budget_exhausted = true;
> continue;
> }
> @@ -779,7 +779,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
> static int __xsk_generic_xmit(struct sock *sk)
> {
> struct xdp_sock *xs = xdp_sk(sk);
> - u32 max_batch = TX_BATCH_SIZE;
> bool sent_frame = false;
> struct xdp_desc desc;
> struct sk_buff *skb;
> @@ -797,7 +796,7 @@ static int __xsk_generic_xmit(struct sock *sk)
> goto out;
>
> while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> - if (max_batch-- == 0) {
> + if (xs->max_tx_budget-- == 0) {
> err = -EAGAIN;
> goto out;
> }
Powered by blists - more mailing lists