[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c5e0e67400816d68e6bf90b4a999bfa28c59043b.camel@siemens.com>
Date: Tue, 10 Sep 2024 04:49:37 +0000
From: "Sverdlin, Alexander" <alexander.sverdlin@...mens.com>
To: "vladimir.oltean@....com" <vladimir.oltean@....com>
CC: "andrew@...n.ch" <andrew@...n.ch>, "olteanv@...il.com"
<olteanv@...il.com>, "LinoSanfilippo@....de" <LinoSanfilippo@....de>,
"daniel.klauer@....de" <daniel.klauer@....de>, "davem@...emloft.net"
<davem@...emloft.net>, "vivien.didelot@...il.com" <vivien.didelot@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "f.fainelli@...il.com"
<f.fainelli@...il.com>, "kuba@...nel.org" <kuba@...nel.org>,
"rafael.richter@....de" <rafael.richter@....de>
Subject: Re: [PATCH net] net: dsa: fix panic when DSA master device unbinds on
shutdown
Hello Vladimir!
On Mon, 2024-09-09 at 16:23 +0200, Alexander Sverdlin wrote:
> On Mon, 2024-09-09 at 17:16 +0300, Vladimir Oltean wrote:
> > > after my first attempts to put a band aid on this failed, I concluded
> > > that both assignments "dsa_ptr = NULL;" in kernel are broken. Or, being more
> > > precise, they break widely spread pattern
> > >
> > > CPU0 CPU1
> > > if (netdev_uses_dsa())
> > > dev->dsa_ptr = NULL;
> > > dev->dsa_ptr->...
> > >
> > > because there is no synchronization whatsoever, so tearing down DSA is actually
> > > broken in many places...
> > >
> > > Seems that something lock-free is required for dsa_ptr, maybe RCU or refcounting,
> > > I'll try to come up with some rework, but any hints are welcome!
> >
> > I'm trying to understand if this rework still leads to NULL dereferences
> > of conduit->dsa_ptr in the receive path? Could you please test?
>
> I didn't test yet (I can do it though), but I belive dsa_conduit_teardown()
> will trigger the same crash eventually.
I've realized, I've tested similar patch already (where I replaced lan9303_shutdown()
with lan9303_remove()), but this only fixed the race with MDIO accesses in lan9303.
However I've applied your below patch (well, onto v6.1.99, but I don't think it matters
for now) and got the same crashes once per hour (on back-to-back reboots under 100MBit
traffic):
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
...
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 6.1.99+git6b4dd8ce06dc #1
...
pc : lan9303_rcv+0x68/0x220
lr : lan9303_rcv+0x14c/0x220
Call trace:
lan9303_rcv+0x68/0x220
dsa_switch_rcv+0x1d0/0x334
__netif_receive_skb_list_core+0x1f4/0x220
netif_receive_skb_list_internal+0x1e0/0x2d0
napi_complete_done+0x70/0x1cc
fec_enet_rx_napi+0x4fc/0xe50
__napi_poll+0x40/0x200
net_rx_action+0x140/0x2e0
handle_softirqs+0x120/0x360
...
> We can probably trigger a NULL pointer dereference in tagging_show() vs shutdown,
> etc...
I wasn't able to trigger these even though I thought it will be even easier...
> I'm actually close to publishing my RCU rework of dsa_ptr, but I would need to
> test it as well...
Hopefully I will be able to test an RCU conversion today with some LOCKDEP and
RCU debugging, otherwise I can just send a compile-tested RFC...
> > diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> > index 668c729946ea..f1ce6d8dc499 100644
> > --- a/net/dsa/dsa.c
> > +++ b/net/dsa/dsa.c
> > @@ -1576,32 +1576,7 @@ EXPORT_SYMBOL_GPL(dsa_unregister_switch);
> > */
> > void dsa_switch_shutdown(struct dsa_switch *ds)
> > {
> > - struct net_device *conduit, *user_dev;
> > - struct dsa_port *dp;
> > -
> > - mutex_lock(&dsa2_mutex);
> > -
> > - if (!ds->setup)
> > - goto out;
> > -
> > - rtnl_lock();
> > -
> > - dsa_switch_for_each_user_port(dp, ds) {
> > - conduit = dsa_port_to_conduit(dp);
> > - user_dev = dp->user;
> > -
> > - netdev_upper_dev_unlink(conduit, user_dev);
> > - }
> > -
> > - /* Disconnect from further netdevice notifiers on the conduit,
> > - * since netdev_uses_dsa() will now return false.
> > - */
> > - dsa_switch_for_each_cpu_port(dp, ds)
> > - dp->conduit->dsa_ptr = NULL;
> > -
> > - rtnl_unlock();
> > -out:
> > - mutex_unlock(&dsa2_mutex);
> > + dsa_unregister_switch(ds);
> > }
> > EXPORT_SYMBOL_GPL(dsa_switch_shutdown);
--
Alexander Sverdlin
Siemens AG
www.siemens.com
Powered by blists - more mailing lists