lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240924084343.syhmwim5swcgppha@skbuf>
Date: Tue, 24 Sep 2024 11:43:43 +0300
From: Vladimir Oltean <vladimir.oltean@....com>
To: Andrew Lunn <andrew@...n.ch>, Florian Fainelli <f.fainelli@...il.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Alexander Sverdlin <alexander.sverdlin@...mens.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] net: dsa: improve shutdown sequence

On Fri, Sep 13, 2024 at 11:35:49PM +0300, Vladimir Oltean wrote:
> Alexander Sverdlin presents 2 problems during shutdown with the
> lan9303 driver. One is specific to lan9303 and the other just happens
> to reproduce there.
> 
> The first problem is that lan9303 is unique among DSA drivers in that it
> calls dev_get_drvdata() at "arbitrary runtime" (not probe, not shutdown,
> not remove):
> 
> phy_state_machine()
> -> ...
>    -> dsa_user_phy_read()
>       -> ds->ops->phy_read()
>          -> lan9303_phy_read()
>             -> chip->ops->phy_read()
>                -> lan9303_mdio_phy_read()
>                   -> dev_get_drvdata()
> 
> But we never stop the phy_state_machine(), so it may continue to run
> after dsa_switch_shutdown(). Our common pattern in all DSA drivers is
> to set drvdata to NULL to suppress the remove() method that may come
> afterwards. But in this case it will result in an NPD.
> 
> The second problem is that the way in which we set
> dp->conduit->dsa_ptr = NULL; is concurrent with receive packet
> processing. dsa_switch_rcv() checks once whether dev->dsa_ptr is NULL,
> but afterwards, rather than continuing to use that non-NULL value,
> dev->dsa_ptr is dereferenced again and again without NULL checks:
> dsa_conduit_find_user() and many other places. In between dereferences,
> there is no locking to ensure that what was valid once continues to be
> valid.
> 
> Both problems have the common aspect that closing the conduit interface
> solves them.
> 
> In the first case, dev_close(conduit) triggers the NETDEV_GOING_DOWN
> event in dsa_user_netdevice_event() which closes user ports as well.
> dsa_port_disable_rt() calls phylink_stop(), which synchronously stops
> the phylink state machine, and ds->ops->phy_read() will thus no longer
> call into the driver after this point.
> 
> In the second case, dev_close(conduit) should do this, as per
> Documentation/networking/driver.rst:
> 
> | Quiescence
> | ----------
> |
> | After the ndo_stop routine has been called, the hardware must
> | not receive or transmit any data.  All in flight packets must
> | be aborted. If necessary, poll or wait for completion of
> | any reset commands.
> 
> So it should be sufficient to ensure that later, when we zeroize
> conduit->dsa_ptr, there will be no concurrent dsa_switch_rcv() call
> on this conduit.
> 
> The addition of the netif_device_detach() function is to ensure that
> ioctls, rtnetlinks and ethtool requests on the user ports no longer
> propagate down to the driver - we're no longer prepared to handle them.
> 
> The race condition actually did not exist when commit 0650bf52b31f
> ("net: dsa: be compatible with masters which unregister on shutdown")
> first introduced dsa_switch_shutdown(). It was created later, when we
> stopped unregistering the user interfaces from a bad spot, and we just
> replaced that sequence with a racy zeroization of conduit->dsa_ptr
> (one which doesn't ensure that the interfaces aren't up).
> 
> Reported-by: Alexander Sverdlin <alexander.sverdlin@...mens.com>
> Closes: https://lore.kernel.org/netdev/2d2e3bba17203c14a5ffdabc174e3b6bbb9ad438.camel@siemens.com/
> Closes: https://lore.kernel.org/netdev/c1bf4de54e829111e0e4a70e7bd1cf523c9550ff.camel@siemens.com/
> Fixes: ee534378f005 ("net: dsa: fix panic when DSA master device unbinds on shutdown")
> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@...mens.com>
> Tested-by: Alexander Sverdlin <alexander.sverdlin@...mens.com>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
> ---

Andrew, Florian, FYI: this is marked as "Needs ACK" in patchwork.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ