lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoAQ8xVXRmnjafgGWYDWy_FYuA=P4_Tzmh1zUkna2BF+nA@mail.gmail.com>
Date: Wed, 18 Jun 2025 08:29:37 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org, 
	pabeni@...hat.com, bjorn@...nel.org, magnus.karlsson@...el.com, 
	maciej.fijalkowski@...el.com, jonathan.lemon@...il.com, sdf@...ichev.me, 
	ast@...nel.org, daniel@...earbox.net, hawk@...nel.org, 
	john.fastabend@...il.com, bpf@...r.kernel.org, netdev@...r.kernel.org, 
	Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH net-next 0/2] net: xsk: add two sysctl knobs

On Wed, Jun 18, 2025 at 1:46 AM Stanislav Fomichev <stfomichev@...il.com> wrote:
>
> On 06/17, Jason Xing wrote:
> > Hi Stanislav,
> >
> > On Tue, Jun 17, 2025 at 9:11 AM Stanislav Fomichev <stfomichev@...il.com> wrote:
> > >
> > > On 06/17, Jason Xing wrote:
> > > > From: Jason Xing <kernelxing@...cent.com>
> > > >
> > > > Introduce a control method in the xsk path to let users have the chance
> > > > to tune it manually.
> > >
> > > Can you expand more on why the defaults don't work for you?
> >
> > We use a user-level tcp stack with xsk to transmit packets that have
> > higher priorities than other normal kernel tcp flows. It turns out
> > that enlarging the number can minimize times of triggering sendto
> > sysctl, which contributes to faster transmission. it's very easy to
> > hit the upper bound (namely, 32) if you log the return value of
> > sendto. I mentioned a bit about this in the second patch, saying that
> > we can have a similar knob already appearing in the qdisc layer.
> > Furthermore, exposing important parameters can help applications
> > complete their AI/auto-tuning to judge which one is the best fit in
> > their production workload. That is also one of the promising
> > tendencies :)
> >
> > >
> > > Also, can we put these settings into the socket instead of (global/ns)
> > > sysctl?
> >
> > As to MAX_PER_SOCKET_BUDGET, it seems not easy to get its
> > corresponding netns? I have no strong opinion on this point for now.

To add to that, after digging into this part, I realized that we're
able to use sock_net(sk)->core.max_tx_budget directly to finish the
namespace stuff because xsk_create() calls sk_alloc() which correlates
its netns in the sk->sk_net. Sounds reasonable?

>
> I'm suggesting something along these lines (see below). And then add
> some way to configure it (plus, obviously, set the default value
> on init). There is also a question on whether you need separate
> values for MAX_PER_SOCKET_BUDGET and TX_BATCH_SIZE, and if yes,

For now, actually I don't see a specific reason to separate them, so
let me use a single one in V2. My use case only expects to see the
TX_BATCH_SIZE adjustment.

> then why.
>
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index 72c000c0ae5f..fb2caec9914d 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -424,7 +424,7 @@ bool xsk_tx_peek_desc(struct xsk_buff_pool *pool, struct xdp_desc *desc)
>         rcu_read_lock();
>  again:
>         list_for_each_entry_rcu(xs, &pool->xsk_tx_list, tx_list) {
> -               if (xs->tx_budget_spent >= MAX_PER_SOCKET_BUDGET) {
> +               if (xs->tx_budget_spent >= xs->max_tx_budget) {

If we implement it like this, xs->max_tx_budget has to read a
per-netns from somewhere and then initialize it. The core problem
still remains: where to store the per netns value.

Do you think using the aforementioned sock_net(sk)->core.max_tx_budget
is possible?

Thanks,
Jason

>                         budget_exhausted = true;
>                         continue;
>                 }
> @@ -779,7 +779,6 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs,
>  static int __xsk_generic_xmit(struct sock *sk)
>  {
>         struct xdp_sock *xs = xdp_sk(sk);
> -       u32 max_batch = TX_BATCH_SIZE;
>         bool sent_frame = false;
>         struct xdp_desc desc;
>         struct sk_buff *skb;
> @@ -797,7 +796,7 @@ static int __xsk_generic_xmit(struct sock *sk)
>                 goto out;
>
>         while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) {
> -               if (max_batch-- == 0) {
> +               if (xs->max_tx_budget-- == 0) {
>                         err = -EAGAIN;
>                         goto out;
>                 }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ