[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAywjhT7R9F+i7ANVy5n=4iTBYWOpS8VtUUgQdW1xEeS+8afyw@mail.gmail.com>
Date: Fri, 25 Apr 2025 16:06:27 -0700
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Joe Damato <jdamato@...tly.com>, Samiullah Khawaja <skhawaja@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, "David S . Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>, almasrymina@...gle.com,
willemb@...gle.com, mkarsten@...terloo.ca, netdev@...r.kernel.org
Subject: Re: [PATCH net-next v5] Add support to set napi threaded for
individual napi
On Fri, Apr 25, 2025 at 11:28 AM Samiullah Khawaja <skhawaja@...gle.com> wrote:
>
> On Thu, Apr 24, 2025 at 4:13 PM Joe Damato <jdamato@...tly.com> wrote:
> >
> > On Wed, Apr 23, 2025 at 08:14:13PM +0000, Samiullah Khawaja wrote:
> > > A net device has a threaded sysctl that can be used to enable threaded
> > > napi polling on all of the NAPI contexts under that device. Allow
> > > enabling threaded napi polling at individual napi level using netlink.
> > >
> > > Extend the netlink operation `napi-set` and allow setting the threaded
> > > attribute of a NAPI. This will enable the threaded polling on a napi
> > > context.
> > >
> > > Add a test in `nl_netdev.py` that verifies various cases of threaded
> > > napi being set at napi and at device level.
> > >
> > > Tested
> > > ./tools/testing/selftests/net/nl_netdev.py
> > > TAP version 13
> > > 1..6
> > > ok 1 nl_netdev.empty_check
> > > ok 2 nl_netdev.lo_check
> > > ok 3 nl_netdev.page_pool_check
> > > ok 4 nl_netdev.napi_list_check
> > > ok 5 nl_netdev.napi_set_threaded
> > > ok 6 nl_netdev.nsim_rxq_reset_down
> > > # Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
> > >
> > > v5:
> > > - This patch was part of:
> > > https://lore.kernel.org/netdev/Z92e2kCYXQ_RsrJh@LQ3V64L9R2/T/
> > > It is being sent separately for the first time.
> >
> > Thanks; I think this is a good change on its own separate from the
> > rest of the series and, IMO, I think it makes it easier to get
> > reviewed and merged.
> Thanks for the review and suggestion to split this out.
> >
> > Probably a matter of personal preference, which you can feel free to
> > ignore, but IMO I think splitting this into 3 patches might make it
> > easier to get Reviewed-bys and make changes ?
> >
> > - core infrastructure changes
> > - netlink related changes (and docs)
> > - selftest
> >
> > Then if feedback comes in for some parts, but not others, it makes
> > it easier for reviewers in the future to know what was already
> > reviewed and what was changed ?
> >
> > Just my 2 cents.
> >
> > The above said, my high level feedback is that I don't think this
> > addresses the concerns we discussed in the v4 about device-wide
> > setting vs per-NAPI settings.
> >
> > See below.
> >
> > [...]
> >
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index d0563ddff6ca..ea3c8a30bb97 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -6888,6 +6888,27 @@ static enum hrtimer_restart napi_watchdog(struct hrtimer *timer)
> > > return HRTIMER_NORESTART;
> > > }
> > >
> > > +int napi_set_threaded(struct napi_struct *napi, bool threaded)
> > > +{
> > > + if (threaded) {
> > > + if (!napi->thread) {
> > > + int err = napi_kthread_create(napi);
> > > +
> > > + if (err)
> > > + return err;
> > > + }
> > > + }
> > > +
> > > + if (napi->config)
> > > + napi->config->threaded = threaded;
> > > +
> > > + /* Make sure kthread is created before THREADED bit is set. */
> > > + smp_mb__before_atomic();
> > > + assign_bit(NAPI_STATE_THREADED, &napi->state, threaded);
> > > +
> > > + return 0;
> > > +}
> >
> > Hm, I wonder if dev_set_threaded can be cleaned up to use this
> > instead of repeating similar code in two places?
I wanted to add something about this part and forgot. The reason
dev_set_thread handling is a little different from individual NAPIs is
because the dev->threaded state on a device needs to be set only if
all underlying NAPIs are guaranteed to have the same state. So that is
why the dev_set_threaded does this in 3 steps,
1- Create kthread for each NAPI. This can fail and if it fails the
threaded state is set to disable/0 since at least one NAPI kthread
failed to create.
2- Set the threaded state on dev->threaded.
3- Set the threaded state flag on each NAPI, this cannot fail.
If we use the napi_set_threaded in dev_set_threaded implementation
then we will be doing something like following,
1- Do napi_set_threaded for each NAPI.
2- If a NAPI fails then revert to disable state and reset NAPI state
to disable for all NAPIs in dev
3- set threaded state on dev->threaded (or maybe set in the start and
set here again on failure).
I think this will complicate the logic and also not preserve the
current behaviour and order of operations. So keeping the
dev_set_threaded implementation separate.
> >
> > > +
> > > int dev_set_threaded(struct net_device *dev, bool threaded)
> > > {
> > > struct napi_struct *napi;
> > > @@ -7144,6 +7165,8 @@ static void napi_restore_config(struct napi_struct *n)
> > > napi_hash_add(n);
> > > n->config->napi_id = n->napi_id;
> > > }
> > > +
> > > + napi_set_threaded(n, n->config->threaded);
> >
> > It makes sense to me that when restoring the config, the kthread is
> > kicked off again (assuming config->thread > 0), but does the
> > napi_save_config path need to stop the thread?
>
> >
> > Not sure if kthread_stop is hit via some other path when
> > napi_disable is called? Can you clarify?
> NAPI kthread are not stopped when napi is disabled. When napis are
> disabled, the NAPI_STATE_DISABLED state is set on them and the
> associated thread goes to sleep after unsetting the STATE_SCHED. The
> kthread_stop is only called on them when NAPI is deleted. This is the
> existing behaviour. Please see napi_disable implementation for
> reference. Also napi_enable doesn't create a new kthread and just sets
> the napi STATE appropriately and once the NAPI is scheduled again the
> thread is woken up. Please seem implementation of napi_schedule for
> reference.
> >
> > From my testing, it seems like setting threaded: 0 using the CLI
> > leaves the thread running. Should it work that way? From a high
> > level that behavior seems somewhat unexpected, don't you think?
> Please see the comment above. The thread goes to sleep and is woken up
> when NAPI is enabled and scheduled again.
> >
> > [...]
> >
> > > static void napi_save_config(struct napi_struct *n)
> > > diff --git a/net/core/netdev-genl.c b/net/core/netdev-genl.c
> > > index b64c614a00c4..f7d000a600cf 100644
> > > --- a/net/core/netdev-genl.c
> > > +++ b/net/core/netdev-genl.c
> > > @@ -184,6 +184,10 @@ netdev_nl_napi_fill_one(struct sk_buff *rsp, struct napi_struct *napi,
> > > if (napi->irq >= 0 && nla_put_u32(rsp, NETDEV_A_NAPI_IRQ, napi->irq))
> > > goto nla_put_failure;
> > >
> > > + if (nla_put_uint(rsp, NETDEV_A_NAPI_THREADED,
> > > + napi_get_threaded(napi)))
> > > + goto nla_put_failure;
> > > +
> > > if (napi->thread) {
> > > pid = task_pid_nr(napi->thread);
> > > if (nla_put_u32(rsp, NETDEV_A_NAPI_PID, pid))
> > > @@ -322,8 +326,14 @@ netdev_nl_napi_set_config(struct napi_struct *napi, struct genl_info *info)
> > > {
> > > u64 irq_suspend_timeout = 0;
> > > u64 gro_flush_timeout = 0;
> > > + uint threaded = 0;
> >
> > I think I'd probably make this a u8 or something? uint is not a
> > commonly used type, AFAICT, and seems out of place here.
> Agreed. Will change
> >
> > > diff --git a/tools/testing/selftests/net/nl_netdev.py b/tools/testing/selftests/net/nl_netdev.py
> > > index beaee5e4e2aa..505564818fa8 100755
> > > --- a/tools/testing/selftests/net/nl_netdev.py
> > > +++ b/tools/testing/selftests/net/nl_netdev.py
> > > @@ -2,6 +2,7 @@
> > > # SPDX-License-Identifier: GPL-2.0
> > >
> > > import time
> > > +from os import system
> > > from lib.py import ksft_run, ksft_exit, ksft_pr
> > > from lib.py import ksft_eq, ksft_ge, ksft_busy_wait
> > > from lib.py import NetdevFamily, NetdevSimDev, ip
> > > @@ -34,6 +35,70 @@ def napi_list_check(nf) -> None:
> > > ksft_eq(len(napis), 100,
> > > comment=f"queue count after reset queue {q} mode {i}")
> > >
> > > +def napi_set_threaded(nf) -> None:
> > > + """
> > > + Test that verifies various cases of napi threaded
> > > + set and unset at napi and device level.
> > > + """
> > > + with NetdevSimDev(queue_count=2) as nsimdev:
> > > + nsim = nsimdev.nsims[0]
> > > +
> > > + ip(f"link set dev {nsim.ifname} up")
> > > +
> > > + napis = nf.napi_get({'ifindex': nsim.ifindex}, dump=True)
> > > + ksft_eq(len(napis), 2)
> > > +
> > > + napi0_id = napis[0]['id']
> > > + napi1_id = napis[1]['id']
> > > +
> > > + # set napi threaded and verify
> > > + nf.napi_set({'id': napi0_id, 'threaded': 1})
> > > + napi0 = nf.napi_get({'id': napi0_id})
> > > + ksft_eq(napi0['threaded'], 1)
> > > +
> > > + ip(f"link set dev {nsim.ifname} down")
> > > + ip(f"link set dev {nsim.ifname} up")
> > > +
> > > + # verify if napi threaded is still set
> > > + napi0 = nf.napi_get({'id': napi0_id})
> > > + ksft_eq(napi0['threaded'], 1)
> >
> > I feel like the test needs to be expanded?
> >
> > It's not just the attribute that matters (although that bit is
> > important), but I think it probably is important that the kthread is
> > still running and that should be checked ?
> >
> > > + # unset napi threaded and verify
> > > + nf.napi_set({'id': napi0_id, 'threaded': 0})
> > > + napi0 = nf.napi_get({'id': napi0_id})
> > > + ksft_eq(napi0['threaded'], 0)
> > > +
> > > + # set napi threaded for napi0
> > > + nf.napi_set({'id': napi0_id, 'threaded': 1})
> > > + napi0 = nf.napi_get({'id': napi0_id})
> > > + ksft_eq(napi0['threaded'], 1)
> > > +
> > > + # check it is not set for napi1
> > > + napi1 = nf.napi_get({'id': napi1_id})
> > > + ksft_eq(napi1['threaded'], 0)
> > > +
> > > + # set threaded at device level
> > > + system(f"echo 1 > /sys/class/net/{nsim.ifname}/threaded")
> > > +
> > > + # check napi threaded is set for both napis
> > > + napi0 = nf.napi_get({'id': napi0_id})
> > > + ksft_eq(napi0['threaded'], 1)
> > > + napi1 = nf.napi_get({'id': napi1_id})
> > > + ksft_eq(napi1['threaded'], 1)
> > > +
> > > + # set napi threaded for napi0
> > > + nf.napi_set({'id': napi0_id, 'threaded': 1})
> > > + napi0 = nf.napi_get({'id': napi0_id})
> > > + ksft_eq(napi0['threaded'], 1)
> >
> > If napi0 is already set to threaded in the previous block for the
> > device level, why set it explicitly again ?
> Will remove
> >
> > > +
> > > + # unset threaded at device level
> > > + system(f"echo 0 > /sys/class/net/{nsim.ifname}/threaded")
> > > +
> > > + # check napi threaded is unset for both napis
> > > + napi0 = nf.napi_get({'id': napi0_id})
> > > + ksft_eq(napi0['threaded'], 0)
> > > + napi1 = nf.napi_get({'id': napi1_id})
> > > + ksft_eq(napi1['threaded'], 0)
> >
> > I ran the test and it passes for me.
> >
> > That said, the test is incomplete or buggy because I've manually
> > identified 1 case that is incorrect which we discussed in the v4 and
> > a second case that seems buggy from a user perspective.
> >
> > Case 1 (we discussed this in the v4, but seems like it was missed
> > here?):
> >
> > Threaded set to 1 and then to 0 at the device level
> >
> > echo 1 > /sys/class/net/eni28160np1/threaded
> > echo 0 > /sys/class/net/eni28160np1/threaded
> >
> > Check the setting:
> >
> > cat /sys/class/net/eni28160np1/threaded
> > 0
> >
> > Dump the settings for netdevsim, noting that threaded is 0, but pid
> > is set (again, should it be?):
> >
> > ./tools/net/ynl/pyynl/cli.py \
> > --spec Documentation/netlink/specs/netdev.yaml \
> > --dump napi-get --json='{"ifindex": 20}'
> >
> > [{'defer-hard-irqs': 0,
> > 'gro-flush-timeout': 0,
> > 'id': 612,
> > 'ifindex': 20,
> > 'irq-suspend-timeout': 0,
> > 'pid': 15728,
> > 'threaded': 0},
> > {'defer-hard-irqs': 0,
> > 'gro-flush-timeout': 0,
> > 'id': 611,
> > 'ifindex': 20,
> > 'irq-suspend-timeout': 0,
> > 'pid': 15729,
> > 'threaded': 0}]
> As explained in the comment earlier, since the kthread is still valid
> and associated with the napi, the PID is valid. I just verified that
> this is the existing behaviour. Not sure whether the pid should be
> hidden if the threaded state is not enabled? Do you think we should
> change this behaviour?
> >
> > Now set NAPI 612 to threaded 1:
> >
> > ./tools/net/ynl/pyynl/cli.py \
> > --spec Documentation/netlink/specs/netdev.yaml \
> > --do napi-set --json='{"id": 612, "threaded": 1}'
> >
> > Dump the settings:
> >
> > ./tools/net/ynl/pyynl/cli.py \
> > --spec Documentation/netlink/specs/netdev.yaml \
> > --dump napi-get --json='{"ifindex": 20}'
> >
> > [{'defer-hard-irqs': 0,
> > 'gro-flush-timeout': 0,
> > 'id': 612,
> > 'ifindex': 20,
> > 'irq-suspend-timeout': 0,
> > 'pid': 15728,
> > 'threaded': 1},
> > {'defer-hard-irqs': 0,
> > 'gro-flush-timeout': 0,
> > 'id': 611,
> > 'ifindex': 20,
> > 'irq-suspend-timeout': 0,
> > 'pid': 15729,
> > 'threaded': 0}]
> >
> > So far, so good.
> >
> > Now set device-wide threaded to 0, which should set NAPI 612 to threaded 0:
> >
> > echo 0 > /sys/class/net/eni28160np1/threaded
> >
> > Dump settings:
> >
> > ./tools/net/ynl/pyynl/cli.py \
> > --spec Documentation/netlink/specs/netdev.yaml \
> > --dump napi-get --json='{"ifindex": 20}'
> >
> > [{'defer-hard-irqs': 0,
> > 'gro-flush-timeout': 0,
> > 'id': 612,
> > 'ifindex': 20,
> > 'irq-suspend-timeout': 0,
> > 'pid': 15728,
> > 'threaded': 1},
> > {'defer-hard-irqs': 0,
> > 'gro-flush-timeout': 0,
> > 'id': 611,
> > 'ifindex': 20,
> > 'irq-suspend-timeout': 0,
> > 'pid': 15729,
> > 'threaded': 0}]
> >
> > Note that NAPI 612 is still set to threaded 1, but we set the
> > device-wide setting to 0.
> >
> > Shouldn't NAPI 612 be threaded = 0 ?
> Yes it should be. I will add a test for this and also remove the
> initial check in dev_set_threaded to fix this behaviour.
> >
> > Second case:
> > - netdevsim is brought up with 2 queues and 2 napis
> > - Threaded NAPI is enabled for napi1
> > - The queue count on the device is reduced from 2 to 1 (therefore
> > napi1 is disabled) via ethtool -L
> >
> > Then:
> >
> > - Is the thread still active? Should it be?
> >
> > IMO: if the napi is disabled the thread should stop because there is
> > no NAPI for it to poll. When the napi is enabled, the thread can
> > start back up.
> Please see the comment above for the explanation of this., but to
> reiterate the thread is there but not actively being scheduled and
> sleeps until napi is enabled again and scheduled.
> >
> > It does not seem that this is currently the case from manual
> > testing.
Powered by blists - more mailing lists