lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c5e0e67400816d68e6bf90b4a999bfa28c59043b.camel@siemens.com>
Date: Tue, 10 Sep 2024 04:49:37 +0000
From: "Sverdlin, Alexander" <alexander.sverdlin@...mens.com>
To: "vladimir.oltean@....com" <vladimir.oltean@....com>
CC: "andrew@...n.ch" <andrew@...n.ch>, "olteanv@...il.com"
	<olteanv@...il.com>, "LinoSanfilippo@....de" <LinoSanfilippo@....de>,
	"daniel.klauer@....de" <daniel.klauer@....de>, "davem@...emloft.net"
	<davem@...emloft.net>, "vivien.didelot@...il.com" <vivien.didelot@...il.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "f.fainelli@...il.com"
	<f.fainelli@...il.com>, "kuba@...nel.org" <kuba@...nel.org>,
	"rafael.richter@....de" <rafael.richter@....de>
Subject: Re: [PATCH net] net: dsa: fix panic when DSA master device unbinds on
 shutdown

Hello Vladimir!

On Mon, 2024-09-09 at 16:23 +0200, Alexander Sverdlin wrote:
> On Mon, 2024-09-09 at 17:16 +0300, Vladimir Oltean wrote:
> > > after my first attempts to put a band aid on this failed, I concluded
> > > that both assignments "dsa_ptr = NULL;" in kernel are broken. Or, being more
> > > precise, they break widely spread pattern
> > > 
> > > CPU0					CPU1
> > > if (netdev_uses_dsa())
> > >  					dev->dsa_ptr = NULL;
> > >          dev->dsa_ptr->...
> > > 
> > > because there is no synchronization whatsoever, so tearing down DSA is actually
> > > broken in many places...
> > > 
> > > Seems that something lock-free is required for dsa_ptr, maybe RCU or refcounting,
> > > I'll try to come up with some rework, but any hints are welcome!
> > 
> > I'm trying to understand if this rework still leads to NULL dereferences
> > of conduit->dsa_ptr in the receive path? Could you please test?
> 
> I didn't test yet (I can do it though), but I belive dsa_conduit_teardown()
> will trigger the same crash eventually.

I've realized, I've tested similar patch already (where I replaced lan9303_shutdown()
with lan9303_remove()), but this only fixed the race with MDIO accesses in lan9303.

However I've applied your below patch (well, onto v6.1.99, but I don't think it matters
for now) and got the same crashes once per hour (on back-to-back reboots under 100MBit
traffic):

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
...
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O       6.1.99+git6b4dd8ce06dc #1
...
pc : lan9303_rcv+0x68/0x220
lr : lan9303_rcv+0x14c/0x220
Call trace:
 lan9303_rcv+0x68/0x220
 dsa_switch_rcv+0x1d0/0x334
 __netif_receive_skb_list_core+0x1f4/0x220
 netif_receive_skb_list_internal+0x1e0/0x2d0
 napi_complete_done+0x70/0x1cc
 fec_enet_rx_napi+0x4fc/0xe50
 __napi_poll+0x40/0x200
 net_rx_action+0x140/0x2e0
 handle_softirqs+0x120/0x360
...

> We can probably trigger a NULL pointer dereference in tagging_show() vs shutdown,
> etc...

I wasn't able to trigger these even though I thought it will be even easier...

> I'm actually close to publishing my RCU rework of dsa_ptr, but I would need to
> test it as well...

Hopefully I will be able to test an RCU conversion today with some LOCKDEP and
RCU debugging, otherwise I can just send a compile-tested RFC...

> > diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> > index 668c729946ea..f1ce6d8dc499 100644
> > --- a/net/dsa/dsa.c
> > +++ b/net/dsa/dsa.c
> > @@ -1576,32 +1576,7 @@ EXPORT_SYMBOL_GPL(dsa_unregister_switch);
> >   */
> >  void dsa_switch_shutdown(struct dsa_switch *ds)
> >  {
> > -	struct net_device *conduit, *user_dev;
> > -	struct dsa_port *dp;
> > -
> > -	mutex_lock(&dsa2_mutex);
> > -
> > -	if (!ds->setup)
> > -		goto out;
> > -
> > -	rtnl_lock();
> > -
> > -	dsa_switch_for_each_user_port(dp, ds) {
> > -		conduit = dsa_port_to_conduit(dp);
> > -		user_dev = dp->user;
> > -
> > -		netdev_upper_dev_unlink(conduit, user_dev);
> > -	}
> > -
> > -	/* Disconnect from further netdevice notifiers on the conduit,
> > -	 * since netdev_uses_dsa() will now return false.
> > -	 */
> > -	dsa_switch_for_each_cpu_port(dp, ds)
> > -		dp->conduit->dsa_ptr = NULL;
> > -
> > -	rtnl_unlock();
> > -out:
> > -	mutex_unlock(&dsa2_mutex);
> > +	dsa_unregister_switch(ds);
> >  }
> >  EXPORT_SYMBOL_GPL(dsa_switch_shutdown);

-- 
Alexander Sverdlin
Siemens AG
www.siemens.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ