[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7db5996ef488f8ca1b9fdc0d39b9e4dd1189b34b.camel@siemens.com>
Date: Thu, 5 Sep 2024 07:11:44 +0000
From: "Sverdlin, Alexander" <alexander.sverdlin@...mens.com>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"vladimir.oltean@....com" <vladimir.oltean@....com>
CC: "andrew@...n.ch" <andrew@...n.ch>, "olteanv@...il.com"
<olteanv@...il.com>, "daniel.klauer@....de" <daniel.klauer@....de>,
"davem@...emloft.net" <davem@...emloft.net>, "vivien.didelot@...il.com"
<vivien.didelot@...il.com>, "LinoSanfilippo@....de" <LinoSanfilippo@....de>,
"f.fainelli@...il.com" <f.fainelli@...il.com>, "kuba@...nel.org"
<kuba@...nel.org>, "rafael.richter@....de" <rafael.richter@....de>
Subject: Re: [PATCH net] net: dsa: fix panic when DSA master device unbinds on
shutdown
Hello Vladimir,
On Wed, 2024-09-04 at 10:03 +0200, Alexander Sverdlin wrote:
> > + /* Disconnect from further netdevice notifiers on the master,
> > + * since netdev_uses_dsa() will now return false.
> > + */
> > + dsa_switch_for_each_cpu_port(dp, ds)
> > + dp->master->dsa_ptr = NULL;
>
> This is unfortunately racy and leads to other panics:
>
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G O 6.1.99+gitb7793b7d9b35 #1
> pc : lan9303_rcv+0x64/0x210
> lr : lan9303_rcv+0x148/0x210
> Call trace:
> lan9303_rcv+0x64/0x210
> dsa_switch_rcv+0x1d8/0x350
> __netif_receive_skb_list_core+0x1f8/0x220
> netif_receive_skb_list_internal+0x18c/0x2a4
> napi_gro_receive+0x238/0x254
> fec_enet_rx_napi+0x830/0xe60
> __napi_poll+0x40/0x210
> net_rx_action+0x138/0x2d0
>
> Even though dsa_switch_rcv() checks
>
> if (unlikely(!cpu_dp)) {
> kfree_skb(skb);
> return 0;
> }
>
> if dsa_switch_shutdown() happens to zero dsa_ptr before
> dsa_conduit_find_user(dev, 0, port) call, the latter will dereference dsa_ptr==NULL:
>
> static inline struct net_device *dsa_conduit_find_user(struct net_device *dev,
> int device, int port)
> {
> struct dsa_port *cpu_dp = dev->dsa_ptr;
> struct dsa_switch_tree *dst = cpu_dp->dst;
>
> I believe there are other race patterns as well if we consider all possible
>
> static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
> struct packet_type *pt, struct net_device *unused)
> {
> struct metadata_dst *md_dst = skb_metadata_dst(skb);
> struct dsa_port *cpu_dp = dev->dsa_ptr;
>
> ...
>
> nskb = cpu_dp->rcv(skb, dev);
>
> >
> > rtnl_unlock();
> > mutex_unlock(&dsa2_mutex);
>
> I'm not sure there is a safe way to zero dsa_ptr without ensuring the port
> is down and there is no ongoing receive in parallel.
after my first attempts to put a band aid on this failed, I concluded
that both assignments "dsa_ptr = NULL;" in kernel are broken. Or, being more
precise, they break widely spread pattern
CPU0 CPU1
if (netdev_uses_dsa())
dev->dsa_ptr = NULL;
dev->dsa_ptr->...
because there is no synchronization whatsoever, so tearing down DSA is actually
broken in many places...
Seems that something lock-free is required for dsa_ptr, maybe RCU or refcounting,
I'll try to come up with some rework, but any hints are welcome!
--
Alexander Sverdlin
Siemens AG
www.siemens.com
Powered by blists - more mailing lists