netdev - Re: [PATCH net] net: dsa: fix panic when DSA master device unbinds on shutdown

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7db5996ef488f8ca1b9fdc0d39b9e4dd1189b34b.camel@siemens.com>
Date: Thu, 5 Sep 2024 07:11:44 +0000
From: "Sverdlin, Alexander" <alexander.sverdlin@...mens.com>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"vladimir.oltean@....com" <vladimir.oltean@....com>
CC: "andrew@...n.ch" <andrew@...n.ch>, "olteanv@...il.com"
	<olteanv@...il.com>, "daniel.klauer@....de" <daniel.klauer@....de>,
	"davem@...emloft.net" <davem@...emloft.net>, "vivien.didelot@...il.com"
	<vivien.didelot@...il.com>, "LinoSanfilippo@....de" <LinoSanfilippo@....de>,
	"f.fainelli@...il.com" <f.fainelli@...il.com>, "kuba@...nel.org"
	<kuba@...nel.org>, "rafael.richter@....de" <rafael.richter@....de>
Subject: Re: [PATCH net] net: dsa: fix panic when DSA master device unbinds on
 shutdown

Hello Vladimir,

On Wed, 2024-09-04 at 10:03 +0200, Alexander Sverdlin wrote:
> > +	/* Disconnect from further netdevice notifiers on the master,
> > +	 * since netdev_uses_dsa() will now return false.
> > +	 */
> > +	dsa_switch_for_each_cpu_port(dp, ds)
> > +		dp->master->dsa_ptr = NULL;
> 
> This is unfortunately racy and leads to other panics:
> 
> Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> CPU: 0 PID: 12 Comm: ksoftirqd/0 Tainted: G           O       6.1.99+gitb7793b7d9b35 #1
> pc : lan9303_rcv+0x64/0x210
> lr : lan9303_rcv+0x148/0x210
> Call trace:
>  lan9303_rcv+0x64/0x210
>  dsa_switch_rcv+0x1d8/0x350
>  __netif_receive_skb_list_core+0x1f8/0x220
>  netif_receive_skb_list_internal+0x18c/0x2a4
>  napi_gro_receive+0x238/0x254
>  fec_enet_rx_napi+0x830/0xe60
>  __napi_poll+0x40/0x210
>  net_rx_action+0x138/0x2d0
> 
> Even though dsa_switch_rcv() checks 
> 
>         if (unlikely(!cpu_dp)) {
>                 kfree_skb(skb);
>                 return 0;
>         }
> 
> if dsa_switch_shutdown() happens to zero dsa_ptr before
> dsa_conduit_find_user(dev, 0, port) call, the latter will dereference dsa_ptr==NULL:
> 
> static inline struct net_device *dsa_conduit_find_user(struct net_device *dev,
>                                                        int device, int port)
> {
>         struct dsa_port *cpu_dp = dev->dsa_ptr;
>         struct dsa_switch_tree *dst = cpu_dp->dst;
> 
> I believe there are other race patterns as well if we consider all possible
> 
> static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
>                           struct packet_type *pt, struct net_device *unused)
> {
>         struct metadata_dst *md_dst = skb_metadata_dst(skb);
>         struct dsa_port *cpu_dp = dev->dsa_ptr;
> 
> ...
> 
>                 nskb = cpu_dp->rcv(skb, dev);
> 
> >  
> >  	rtnl_unlock();
> >  	mutex_unlock(&dsa2_mutex);
> 
> I'm not sure there is a safe way to zero dsa_ptr without ensuring the port
> is down and there is no ongoing receive in parallel.

after my first attempts to put a band aid on this failed, I concluded
that both assignments "dsa_ptr = NULL;" in kernel are broken. Or, being more
precise, they break widely spread pattern

CPU0					CPU1
if (netdev_uses_dsa())
					dev->dsa_ptr = NULL;
        dev->dsa_ptr->...

because there is no synchronization whatsoever, so tearing down DSA is actually
broken in many places...

Seems that something lock-free is required for dsa_ptr, maybe RCU or refcounting,
I'll try to come up with some rework, but any hints are welcome!

-- 
Alexander Sverdlin
Siemens AG
www.siemens.com