lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250723075605.16ec3756@kernel.org>
Date: Wed, 23 Jul 2025 07:56:05 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Samiullah Khawaja <skhawaja@...gle.com>
Cc: "David S . Miller" <davem@...emloft.net>, Eric Dumazet
 <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
 almasrymina@...gle.com, willemb@...gle.com, netdev@...r.kernel.org, Joe
 Damato <joe@...a.to>
Subject: Re: [PATCH net-next] net: Restore napi threaded state only when it
 is enabled

On Tue, 22 Jul 2025 19:36:31 -0700 Samiullah Khawaja wrote:
> On Tue, Jul 22, 2025 at 6:36 PM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > On Mon, 21 Jul 2025 23:36:08 +0000 Samiullah Khawaja wrote:  
> > > Commit 2677010e7793 ("Add support to set NAPI threaded for individual
> > > NAPI") added support to enable/disable threaded napi using netlink. This
> > > also extended the napi config save/restore functionality to set the napi
> > > threaded state. This breaks the netdev reset when threaded napi is  
> >
> > "This breaks the netdev reset" is very vague.  
> Basically on netdev reset inside napi_enable when it calls
> napi_restore_config it tries to stop the NAPI kthread. Since during
> napi_enable, the NAPI has STATE_SCHED set on it, the stop_kthread gets
> stuck waiting for the STATE_SCHED to be unset. It should not be
> destroying the kthread since threaded is enabled at device level. But
> I think your point below is valid, we should probably set
> napi->config->threaded in netif_set_threaded.
> 
> I should add this to the commit message.

Ah, I see! The impermanence of the DISABLED bit strikes again :(

> > > enabled at device level as the napi_restore_config tries to stop the
> > > kthreads as napi->config->thread is false when threaded is enabled at
> > > device level.  
> >
> > My reading of the commit message is that the WARN triggers, but
> > looking at the code I think you mean that we fail to update the
> > config when we set at the device level?
> >  
> > > The napi_restore_config should only restore the napi threaded state when
> > > threaded is enabled at NAPI level.
> > >
> > > The issue can be reproduced on virtio-net device using qemu. To
> > > reproduce the issue run following,
> > >
> > >   echo 1 > /sys/class/net/threaded
> > >   ethtool -L eth0 combined 1  
> >
> > Maybe we should add that as a test under tools/testing/drivers/net -
> > it will run against netdevsim but also all drivers we test which
> > currently means virtio and fbnic, but hopefully soon more. Up to you.  
> +1
> 
> I do want to add a test for it. I was thinking of extending
> nl_netdev.py but that doesn't seem suitable as it only looks at the
> state. I will look at the driver directory. Should I send a test with
> this fix or should I send that later after the next reopen?

We recommend sending the patch and the fix in one series, but up to you.

> > I'm not sure I agree with the semantics, tho. IIUC you're basically
> > making the code prefer threaded option. If user enables threading
> > for the device, then disables it for a NAPI - reconfiguration will
> > lose the fact that specific NAPI instance was supposed to have threading
> > disabled. Why not update napi->config in netif_set_threaded() instead?  
> I think this discrepancy is orthogonal to the stopping of threads in
> restore. This is because the napi_enable is calling restore_config and
> then setting the STATE_THREADED bit on napis based on dev->threaded.
> Even if restore unsets the THREADED bit (which would have already been
> unset during napi_disable), it will be set again based on
> dev->threaded.
> 
> I think napi_restore_config should be called after setting up STATE bits?

Yes, I think that would work. In fact I think it makes more sense in
the first place to "list" the NAPI in the hash table accessible by user
space only after it's actually enabled.

I'd also be tempted to move some of the config restoration to
netif_napi_add(), so that we don't start the thread just to stop it.
But we'd need it in both places, IIRC netlink only holds the instance
lock, so theoretically the config may change between "add" and "enable"
:(

cc: Joe, who's probably offline, but note his non-Fastly address

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ