lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <34366741-a472-4ebd-90cc-07c8447f06b9@redhat.com>
Date: Tue, 24 Sep 2024 10:52:29 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Vladimir Oltean <vladimir.oltean@....com>, Andrew Lunn <andrew@...n.ch>,
 Florian Fainelli <f.fainelli@...il.com>
Cc: netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Alexander Sverdlin <alexander.sverdlin@...mens.com>,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH net] net: dsa: improve shutdown sequence

On 9/24/24 10:43, Vladimir Oltean wrote:
> On Fri, Sep 13, 2024 at 11:35:49PM +0300, Vladimir Oltean wrote:
>> Alexander Sverdlin presents 2 problems during shutdown with the
>> lan9303 driver. One is specific to lan9303 and the other just happens
>> to reproduce there.
>>
>> The first problem is that lan9303 is unique among DSA drivers in that it
>> calls dev_get_drvdata() at "arbitrary runtime" (not probe, not shutdown,
>> not remove):
>>
>> phy_state_machine()
>> -> ...
>>     -> dsa_user_phy_read()
>>        -> ds->ops->phy_read()
>>           -> lan9303_phy_read()
>>              -> chip->ops->phy_read()
>>                 -> lan9303_mdio_phy_read()
>>                    -> dev_get_drvdata()
>>
>> But we never stop the phy_state_machine(), so it may continue to run
>> after dsa_switch_shutdown(). Our common pattern in all DSA drivers is
>> to set drvdata to NULL to suppress the remove() method that may come
>> afterwards. But in this case it will result in an NPD.
>>
>> The second problem is that the way in which we set
>> dp->conduit->dsa_ptr = NULL; is concurrent with receive packet
>> processing. dsa_switch_rcv() checks once whether dev->dsa_ptr is NULL,
>> but afterwards, rather than continuing to use that non-NULL value,
>> dev->dsa_ptr is dereferenced again and again without NULL checks:
>> dsa_conduit_find_user() and many other places. In between dereferences,
>> there is no locking to ensure that what was valid once continues to be
>> valid.
>>
>> Both problems have the common aspect that closing the conduit interface
>> solves them.
>>
>> In the first case, dev_close(conduit) triggers the NETDEV_GOING_DOWN
>> event in dsa_user_netdevice_event() which closes user ports as well.
>> dsa_port_disable_rt() calls phylink_stop(), which synchronously stops
>> the phylink state machine, and ds->ops->phy_read() will thus no longer
>> call into the driver after this point.
>>
>> In the second case, dev_close(conduit) should do this, as per
>> Documentation/networking/driver.rst:
>>
>> | Quiescence
>> | ----------
>> |
>> | After the ndo_stop routine has been called, the hardware must
>> | not receive or transmit any data.  All in flight packets must
>> | be aborted. If necessary, poll or wait for completion of
>> | any reset commands.
>>
>> So it should be sufficient to ensure that later, when we zeroize
>> conduit->dsa_ptr, there will be no concurrent dsa_switch_rcv() call
>> on this conduit.
>>
>> The addition of the netif_device_detach() function is to ensure that
>> ioctls, rtnetlinks and ethtool requests on the user ports no longer
>> propagate down to the driver - we're no longer prepared to handle them.
>>
>> The race condition actually did not exist when commit 0650bf52b31f
>> ("net: dsa: be compatible with masters which unregister on shutdown")
>> first introduced dsa_switch_shutdown(). It was created later, when we
>> stopped unregistering the user interfaces from a bad spot, and we just
>> replaced that sequence with a racy zeroization of conduit->dsa_ptr
>> (one which doesn't ensure that the interfaces aren't up).
>>
>> Reported-by: Alexander Sverdlin <alexander.sverdlin@...mens.com>
>> Closes: https://lore.kernel.org/netdev/2d2e3bba17203c14a5ffdabc174e3b6bbb9ad438.camel@siemens.com/
>> Closes: https://lore.kernel.org/netdev/c1bf4de54e829111e0e4a70e7bd1cf523c9550ff.camel@siemens.com/
>> Fixes: ee534378f005 ("net: dsa: fix panic when DSA master device unbinds on shutdown")
>> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@...mens.com>
>> Tested-by: Alexander Sverdlin <alexander.sverdlin@...mens.com>
>> Signed-off-by: Vladimir Oltean <vladimir.oltean@....com>
>> ---
> 
> Andrew, Florian, FYI: this is marked as "Needs ACK" in patchwork.

FTR, some additional slack was guaranteed for reviews, given we are in 
the conferences season ;) (and some/most people are traveling)

Cheers,

Paolo




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ