[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAywjhS1_oXvF21nyZiNstahxR7dkkPPPPS8PrwTC8fxPos5aA@mail.gmail.com>
Date: Tue, 29 Jul 2025 12:30:48 -0700
From: Samiullah Khawaja <skhawaja@...gle.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: "David S . Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>, almasrymina@...gle.com, willemb@...gle.com,
netdev@...r.kernel.org, Joe Damato <joe@...a.to>
Subject: Re: [PATCH net-next] net: Restore napi threaded state only when it is enabled
On Wed, Jul 23, 2025 at 7:56 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Tue, 22 Jul 2025 19:36:31 -0700 Samiullah Khawaja wrote:
> > On Tue, Jul 22, 2025 at 6:36 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > >
> > > On Mon, 21 Jul 2025 23:36:08 +0000 Samiullah Khawaja wrote:
> > > > Commit 2677010e7793 ("Add support to set NAPI threaded for individual
> > > > NAPI") added support to enable/disable threaded napi using netlink. This
> > > > also extended the napi config save/restore functionality to set the napi
> > > > threaded state. This breaks the netdev reset when threaded napi is
> > >
> > > "This breaks the netdev reset" is very vague.
> > Basically on netdev reset inside napi_enable when it calls
> > napi_restore_config it tries to stop the NAPI kthread. Since during
> > napi_enable, the NAPI has STATE_SCHED set on it, the stop_kthread gets
> > stuck waiting for the STATE_SCHED to be unset. It should not be
> > destroying the kthread since threaded is enabled at device level. But
> > I think your point below is valid, we should probably set
> > napi->config->threaded in netif_set_threaded.
> >
> > I should add this to the commit message.
>
> Ah, I see! The impermanence of the DISABLED bit strikes again :(
>
> > > > enabled at device level as the napi_restore_config tries to stop the
> > > > kthreads as napi->config->thread is false when threaded is enabled at
> > > > device level.
> > >
> > > My reading of the commit message is that the WARN triggers, but
> > > looking at the code I think you mean that we fail to update the
> > > config when we set at the device level?
> > >
> > > > The napi_restore_config should only restore the napi threaded state when
> > > > threaded is enabled at NAPI level.
> > > >
> > > > The issue can be reproduced on virtio-net device using qemu. To
> > > > reproduce the issue run following,
> > > >
> > > > echo 1 > /sys/class/net/threaded
> > > > ethtool -L eth0 combined 1
> > >
> > > Maybe we should add that as a test under tools/testing/drivers/net -
> > > it will run against netdevsim but also all drivers we test which
> > > currently means virtio and fbnic, but hopefully soon more. Up to you.
> > +1
> >
> > I do want to add a test for it. I was thinking of extending
> > nl_netdev.py but that doesn't seem suitable as it only looks at the
> > state. I will look at the driver directory. Should I send a test with
> > this fix or should I send that later after the next reopen?
>
> We recommend sending the patch and the fix in one series, but up to you.
>
> > > I'm not sure I agree with the semantics, tho. IIUC you're basically
> > > making the code prefer threaded option. If user enables threading
> > > for the device, then disables it for a NAPI - reconfiguration will
> > > lose the fact that specific NAPI instance was supposed to have threading
> > > disabled. Why not update napi->config in netif_set_threaded() instead?
I will do this.
> > I think this discrepancy is orthogonal to the stopping of threads in
> > restore. This is because the napi_enable is calling restore_config and
> > then setting the STATE_THREADED bit on napis based on dev->threaded.
> > Even if restore unsets the THREADED bit (which would have already been
> > unset during napi_disable), it will be set again based on
> > dev->threaded.
Looking at this again, it seems the current mechanism should be fine
as if the napi thread was disabled for a napi then the kthread
associated with it will be destroyed also. This means that the
STATE_THREADED will not be enabled for it. I will set the threaded
config for each napi in netif_set_threaded.
> >
> > I think napi_restore_config should be called after setting up STATE bits?
This is not needed.
>
> Yes, I think that would work. In fact I think it makes more sense in
> the first place to "list" the NAPI in the hash table accessible by user
> space only after it's actually enabled.
>
> I'd also be tempted to move some of the config restoration to
> netif_napi_add(), so that we don't start the thread just to stop it.
> But we'd need it in both places, IIRC netlink only holds the instance
> lock, so theoretically the config may change between "add" and "enable"
> :(
>
> cc: Joe, who's probably offline, but note his non-Fastly address
Powered by blists - more mailing lists