lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAywjhT7R9F+i7ANVy5n=4iTBYWOpS8VtUUgQdW1xEeS+8afyw@mail.gmail.com>
Date: Fri, 25 Apr 2025 16:06:27 -0700
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Joe Damato <jdamato@...tly.com>, Samiullah Khawaja <skhawaja@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, "David S . Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, almasrymina@...gle.com, 
	willemb@...gle.com, mkarsten@...terloo.ca, netdev@...r.kernel.org
Subject: Re: [PATCH net-next v5] Add support to set napi threaded for
 individual napi

On Fri, Apr 25, 2025 at 11:28 AM Samiullah Khawaja <skhawaja@...gle.com> wrote:
>
> On Thu, Apr 24, 2025 at 4:13 PM Joe Damato <jdamato@...tly.com> wrote:
> >
> > On Wed, Apr 23, 2025 at 08:14:13PM +0000, Samiullah Khawaja wrote:
> > > A net device has a threaded sysctl that can be used to enable threaded
> > > napi polling on all of the NAPI contexts under that device. Allow
> > > enabling threaded napi polling at individual napi level using netlink.
> > >
> > > Extend the netlink operation `napi-set` and allow setting the threaded
> > > attribute of a NAPI. This will enable the threaded polling on a napi
> > > context.
> > >
> > > Add a test in `nl_netdev.py` that verifies various cases of threaded
> > > napi being set at napi and at device level.
> > >
> > > Tested
> > >  ./tools/testing/selftests/net/nl_netdev.py
> > >  TAP version 13
> > >  1..6
> > >  ok 1 nl_netdev.empty_check
> > >  ok 2 nl_netdev.lo_check
> > >  ok 3 nl_netdev.page_pool_check
> > >  ok 4 nl_netdev.napi_list_check
> > >  ok 5 nl_netdev.napi_set_threaded
> > >  ok 6 nl_netdev.nsim_rxq_reset_down
> > >  # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
> > >
> > > v5:
> > >  - This patch was part of:
> > >  https://lore.kernel.org/netdev/Z92e2kCYXQ_RsrJh@LQ3V64L9R2/T/
> > >  It is being sent separately for the first time.
> >
> > Thanks; I think this is a good change on its own separate from the
> > rest of the series and, IMO, I think it makes it easier to get
> > reviewed and merged.
> Thanks for the review and suggestion to split this out.
> >
> > Probably a matter of personal preference, which you can feel free to
> > ignore, but IMO I think splitting this into 3 patches might make it
> > easier to get Reviewed-bys and make changes ?
> >
> >   - core infrastructure changes
> >   - netlink related changes (and docs)
> >   - selftest
> >
> > Then if feedback comes in for some parts, but not others, it makes
> > it easier for reviewers in the future to know what was already
> > reviewed and what was changed ?
> >
> > Just my 2 cents.
> >
> > The above said, my high level feedback is that I don't think this
> > addresses the concerns we discussed in the v4 about device-wide
> > setting vs per-NAPI settings.
> >
> > See below.
> >
> > [...]
> >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index d0563ddff6ca..ea3c8a30bb97 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -6888,6 +6888,27 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
> > >       return HRTIMER_NORESTART;
> > >  }
> > >
> > > +int napi_set_threaded(struct napi_struct *napi, bool threaded)
> > > +{
> > > +     if (threaded) {
> > > +             if (!napi->thread) {
> > > +                     int err = napi_kthread_create(napi);
> > > +
> > > +                     if (err)
> > > +                             return err;
> > > +             }
> > > +     }
> > > +
> > > +     if (napi->config)
> > > +             napi->config->threaded = threaded;
> > > +
> > > +     /* Make sure kthread is created before THREADED bit is set. */
> > > +     smp_mb__before_atomic();
> > > +     assign_bit(NAPI_STATE_THREADED, &napi->state, threaded);
> > > +
> > > +     return 0;
> > > +}
> >
> > Hm, I wonder if dev_set_threaded can be cleaned up to use this
> > instead of repeating similar code in two places?
I wanted to add something about this part and forgot. The reason
dev_set_thread handling is a little different from individual NAPIs is
because the dev->threaded state on a device needs to be set only if
all underlying NAPIs are guaranteed to have the same state. So that is
why the dev_set_threaded does this in 3 steps,

1- Create kthread for each NAPI. This can fail and if it fails the
threaded state is set to disable/0 since at least one NAPI kthread
failed to create.
2- Set the threaded state on dev->threaded.
3- Set the threaded state flag on each NAPI, this cannot fail.

If we use the napi_set_threaded in dev_set_threaded implementation
then we will be doing something like following,

1- Do napi_set_threaded for each NAPI.
2- If a NAPI fails then revert to disable state and reset NAPI state
to disable for all NAPIs in dev
3- set threaded state on dev->threaded (or maybe set in the start and
set here again on failure).

I think this will complicate the logic and also not preserve the
current behaviour and order of operations. So keeping the
dev_set_threaded implementation separate.
> >
> > > +
> > >  int dev_set_threaded(struct net_device *dev, bool threaded)
> > >  {
> > >       struct napi_struct *napi;
> > > @@ -7144,6 +7165,8 @@ static void napi_restore_config(struct napi_struct *n)
> > >               napi_hash_add(n);
> > >               n->config->napi_id = n->napi_id;
> > >       }
> > > +
> > > +     napi_set_threaded(n, n->config->threaded);
> >
> > It makes sense to me that when restoring the config, the kthread is
> > kicked off again (assuming config->thread > 0), but does the
> > napi_save_config path need to stop the thread?
>
> >
> > Not sure if kthread_stop is hit via some other path when
> > napi_disable is called? Can you clarify?
> NAPI kthread are not stopped when napi is disabled. When napis are
> disabled, the NAPI_STATE_DISABLED state is set on them and the
> associated thread goes to sleep after unsetting the STATE_SCHED. The
> kthread_stop is only called on them when NAPI is deleted. This is the
> existing behaviour. Please see napi_disable implementation for
> reference. Also napi_enable doesn't create a new kthread and just sets
> the napi STATE appropriately and once the NAPI is scheduled again the
> thread is woken up. Please seem implementation of napi_schedule for
> reference.
> >
> > From my testing, it seems like setting threaded: 0 using the CLI
> > leaves the thread running. Should it work that way? From a high
> > level that behavior seems somewhat unexpected, don't you think?
> Please see the comment above. The thread goes to sleep and is woken up
> when NAPI is enabled and scheduled again.
> >
> > [...]
> >
> > >  static void napi_save_config(struct napi_struct *n)
> > > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> > > index b64c614a00c4..f7d000a600cf 100644
> > > --- a/net/core/netdev-genl.c
> > > +++ b/net/core/netdev-genl.c
> > > @@ -184,6 +184,10 @@ netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi,
> > >       if (napi->irq >= 0 && nla_put_u32(rsp, NETDEV_A_NAPI_IRQ, napi->irq))
> > >               goto nla_put_failure;
> > >
> > > +     if (nla_put_uint(rsp, NETDEV_A_NAPI_THREADED,
> > > +                      napi_get_threaded(napi)))
> > > +             goto nla_put_failure;
> > > +
> > >       if (napi->thread) {
> > >               pid = task_pid_nr(napi->thread);
> > >               if (nla_put_u32(rsp, NETDEV_A_NAPI_PID, pid))
> > > @@ -322,8 +326,14 @@ netdev_nl_napi_set_config(struct napi_struct *napi, struct genl_info *info)
> > >  {
> > >       u64 irq_suspend_timeout = 0;
> > >       u64 gro_flush_timeout = 0;
> > > +     uint threaded = 0;
> >
> > I think I'd probably make this a u8 or something? uint is not a
> > commonly used type, AFAICT, and seems out of place here.
> Agreed. Will change
> >
> > > diff --git a/tools/testing/selftests/net/nl_netdev.py b/tools/testing/selftests/net/nl_netdev.py
> > > index beaee5e4e2aa..505564818fa8 100755
> > > --- a/tools/testing/selftests/net/nl_netdev.py
> > > +++ b/tools/testing/selftests/net/nl_netdev.py
> > > @@ -2,6 +2,7 @@
> > >  # SPDX-License-Identifier: GPL-2.0
> > >
> > >  import time
> > > +from os import system
> > >  from lib.py import ksft_run, ksft_exit, ksft_pr
> > >  from lib.py import ksft_eq, ksft_ge, ksft_busy_wait
> > >  from lib.py import NetdevFamily, NetdevSimDev, ip
> > > @@ -34,6 +35,70 @@ def napi_list_check(nf) -> None:
> > >                  ksft_eq(len(napis), 100,
> > >                          comment=f"queue count after reset queue {q} mode {i}")
> > >
> > > +def napi_set_threaded(nf) -> None:
> > > +    """
> > > +    Test that verifies various cases of napi threaded
> > > +    set and unset at napi and device level.
> > > +    """
> > > +    with NetdevSimDev(queue_count=2) as nsimdev:
> > > +        nsim = nsimdev.nsims[0]
> > > +
> > > +        ip(f"link set dev {nsim.ifname} up")
> > > +
> > > +        napis = nf.napi_get({'ifindex': nsim.ifindex}, dump=True)
> > > +        ksft_eq(len(napis), 2)
> > > +
> > > +        napi0_id = napis[0]['id']
> > > +        napi1_id = napis[1]['id']
> > > +
> > > +        # set napi threaded and verify
> > > +        nf.napi_set({'id': napi0_id, 'threaded': 1})
> > > +        napi0 = nf.napi_get({'id': napi0_id})
> > > +        ksft_eq(napi0['threaded'], 1)
> > > +
> > > +        ip(f"link set dev {nsim.ifname} down")
> > > +        ip(f"link set dev {nsim.ifname} up")
> > > +
> > > +        # verify if napi threaded is still set
> > > +        napi0 = nf.napi_get({'id': napi0_id})
> > > +        ksft_eq(napi0['threaded'], 1)
> >
> > I feel like the test needs to be expanded?
> >
> > It's not just the attribute that matters (although that bit is
> > important), but I think it probably is important that the kthread is
> > still running and that should be checked ?
> >
> > > +        # unset napi threaded and verify
> > > +        nf.napi_set({'id': napi0_id, 'threaded': 0})
> > > +        napi0 = nf.napi_get({'id': napi0_id})
> > > +        ksft_eq(napi0['threaded'], 0)
> > > +
> > > +        # set napi threaded for napi0
> > > +        nf.napi_set({'id': napi0_id, 'threaded': 1})
> > > +        napi0 = nf.napi_get({'id': napi0_id})
> > > +        ksft_eq(napi0['threaded'], 1)
> > > +
> > > +        # check it is not set for napi1
> > > +        napi1 = nf.napi_get({'id': napi1_id})
> > > +        ksft_eq(napi1['threaded'], 0)
> > > +
> > > +        # set threaded at device level
> > > +        system(f"echo 1 > /sys/class/net/{nsim.ifname}/threaded")
> > > +
> > > +        # check napi threaded is set for both napis
> > > +        napi0 = nf.napi_get({'id': napi0_id})
> > > +        ksft_eq(napi0['threaded'], 1)
> > > +        napi1 = nf.napi_get({'id': napi1_id})
> > > +        ksft_eq(napi1['threaded'], 1)
> > > +
> > > +        # set napi threaded for napi0
> > > +        nf.napi_set({'id': napi0_id, 'threaded': 1})
> > > +        napi0 = nf.napi_get({'id': napi0_id})
> > > +        ksft_eq(napi0['threaded'], 1)
> >
> > If napi0 is already set to threaded in the previous block for the
> > device level, why set it explicitly again ?
> Will remove
> >
> > > +
> > > +        # unset threaded at device level
> > > +        system(f"echo 0 > /sys/class/net/{nsim.ifname}/threaded")
> > > +
> > > +        # check napi threaded is unset for both napis
> > > +        napi0 = nf.napi_get({'id': napi0_id})
> > > +        ksft_eq(napi0['threaded'], 0)
> > > +        napi1 = nf.napi_get({'id': napi1_id})
> > > +        ksft_eq(napi1['threaded'], 0)
> >
> > I ran the test and it passes for me.
> >
> > That said, the test is incomplete or buggy because I've manually
> > identified 1 case that is incorrect which we discussed in the v4 and
> > a second case that seems buggy from a user perspective.
> >
> > Case 1 (we discussed this in the v4, but seems like it was missed
> > here?):
> >
> > Threaded set to 1 and then to 0 at the device level
> >
> >   echo 1 > /sys/class/net/eni28160np1/threaded
> >   echo 0 > /sys/class/net/eni28160np1/threaded
> >
> > Check the setting:
> >
> >   cat /sys/class/net/eni28160np1/threaded
> >   0
> >
> > Dump the settings for netdevsim, noting that threaded is 0, but pid
> > is set (again, should it be?):
> >
> >   ./tools/net/ynl/pyynl/cli.py \
> >        --spec Documentation/netlink/specs/netdev.yaml \
> >        --dump napi-get --json='{"ifindex": 20}'
> >
> >   [{'defer-hard-irqs': 0,
> >     'gro-flush-timeout': 0,
> >     'id': 612,
> >     'ifindex': 20,
> >     'irq-suspend-timeout': 0,
> >     'pid': 15728,
> >     'threaded': 0},
> >    {'defer-hard-irqs': 0,
> >     'gro-flush-timeout': 0,
> >     'id': 611,
> >     'ifindex': 20,
> >     'irq-suspend-timeout': 0,
> >     'pid': 15729,
> >     'threaded': 0}]
> As explained in the comment earlier, since the kthread is still valid
> and associated with the napi, the PID is valid. I just verified that
> this is the existing behaviour. Not sure whether the pid should be
> hidden if the threaded state is not enabled? Do you think we should
> change this behaviour?
> >
> > Now set NAPI 612 to threaded 1:
> >
> >   ./tools/net/ynl/pyynl/cli.py \
> >       --spec Documentation/netlink/specs/netdev.yaml \
> >       --do napi-set --json='{"id": 612, "threaded": 1}'
> >
> > Dump the settings:
> >
> >   ./tools/net/ynl/pyynl/cli.py \
> >       --spec Documentation/netlink/specs/netdev.yaml \
> >       --dump napi-get --json='{"ifindex": 20}'
> >
> >   [{'defer-hard-irqs': 0,
> >     'gro-flush-timeout': 0,
> >     'id': 612,
> >     'ifindex': 20,
> >     'irq-suspend-timeout': 0,
> >     'pid': 15728,
> >     'threaded': 1},
> >    {'defer-hard-irqs': 0,
> >     'gro-flush-timeout': 0,
> >     'id': 611,
> >     'ifindex': 20,
> >     'irq-suspend-timeout': 0,
> >     'pid': 15729,
> >     'threaded': 0}]
> >
> > So far, so good.
> >
> > Now set device-wide threaded to 0, which should set NAPI 612 to threaded 0:
> >
> >    echo 0 > /sys/class/net/eni28160np1/threaded
> >
> > Dump settings:
> >
> >   ./tools/net/ynl/pyynl/cli.py \
> >      --spec Documentation/netlink/specs/netdev.yaml \
> >      --dump napi-get --json='{"ifindex": 20}'
> >
> >   [{'defer-hard-irqs': 0,
> >     'gro-flush-timeout': 0,
> >     'id': 612,
> >     'ifindex': 20,
> >     'irq-suspend-timeout': 0,
> >     'pid': 15728,
> >     'threaded': 1},
> >    {'defer-hard-irqs': 0,
> >     'gro-flush-timeout': 0,
> >     'id': 611,
> >     'ifindex': 20,
> >     'irq-suspend-timeout': 0,
> >     'pid': 15729,
> >     'threaded': 0}]
> >
> > Note that NAPI 612 is still set to threaded 1, but we set the
> > device-wide setting to 0.
> >
> > Shouldn't NAPI 612 be threaded = 0 ?
> Yes it should be. I will add a test for this and also remove the
> initial check in dev_set_threaded to fix this behaviour.
> >
> > Second case:
> >   - netdevsim is brought up with 2 queues and 2 napis
> >   - Threaded NAPI is enabled for napi1
> >   - The queue count on the device is reduced from 2 to 1 (therefore
> >     napi1 is disabled) via ethtool -L
> >
> > Then:
> >
> >   - Is the thread still active? Should it be?
> >
> > IMO: if the napi is disabled the thread should stop because there is
> > no NAPI for it to poll. When the napi is enabled, the thread can
> > start back up.
> Please see the comment above for the explanation of this., but to
> reiterate the thread is there but not actively being scheduled and
> sleeps until napi is enabled again and scheduled.
> >
> > It does not seem that this is currently the case from manual
> > testing.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ