lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dlzyk2xvkdqk75urljgazy7e4hxn4jwozpb3skpgitb3oihmxp@54dgi77ndjvk>
Date: Fri, 25 Jul 2025 07:57:56 -0700
From: Breno Leitao <leitao@...ian.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Paolo Abeni <pabeni@...hat.com>, davem@...emloft.net, 
	netdev@...r.kernel.org, edumazet@...gle.com, andrew+netdev@...n.ch, horms@...nel.org, 
	Jason Wang <jasowang@...hat.com>, Zigit Zo <zuozhijie@...edance.com>, 
	"Michael S. Tsirkin" <mst@...hat.com>, Xuan Zhuo <xuanzhuo@...ux.alibaba.com>, 
	Eugenio Pérez <eperezma@...hat.com>, sdf@...ichev.me
Subject: Re: [PATCH net] netpoll: prevent hanging NAPI when netcons gets
 enabled

Hello Jakub,

On Fri, Jul 25, 2025 at 07:14:34AM -0700, Jakub Kicinski wrote:
> > > +static int netconsole_setup_and_enable(struct netconsole_target *nt)
> > > +{
> > > +	int ret;
> > > +
> > > +	ret = netpoll_setup(&nt->np);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	/* Make sure all NAPI polls which started before dev->npinfo
> > > +	 * was visible have exited before we start calling NAPI poll.
> > > +	 * NAPI skips locking if dev->npinfo is NULL.
> > > +	 */
> > > +	synchronize_rcu();  
> > 
> > I'm wondering if it would make any sense to move the above in
> > netpoll_setup(), make the exposed symbol safe.
> 
> Fair, somehow I convinced myself that the nt->enabled = true;
> is more relevant.

Why not doing it in __netpoll_setup() side instead of netconsole?

If I understand the problem clearly, this is a problem on netpoll
initialization, and netconsole is just one of the users.

I am wondering if calling synchronize_rcu() after rcu_assign_pointer()
might not be enought.

	diff --git a/net/core/netpoll.c b/net/core/netpoll.c
	index a1da97b5b30b6..a20b8cf261306 100644
	--- a/net/core/netpoll.c
	+++ b/net/core/netpoll.c
	@@ -600,6 +600,8 @@ int __netpoll_setup(struct netpoll *np, struct net_device *ndev)
		/* last thing to do is link it to the net device structure */
		rcu_assign_pointer(ndev->npinfo, npinfo);

	+       synchronize_rcu();
		return 0;

	free_npinfo:

	
Thanks for investigating it, this is a fascinating bug.
--breno

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ