netdev - Re: [PATCH v2] net: dsa: mv88e6xxx: propperly shutdown PPU re-enable timer on destroy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <8c882807-c8c1-4d44-ac13-19d12b2d976f@lunn.ch>
Date: Wed, 4 Dec 2024 03:31:19 +0100
From: Andrew Lunn <andrew@...n.ch>
To: David Oberhollenzer <david.oberhollenzer@...ma-star.at>
Cc: netdev@...r.kernel.org, Julian.FRIEDRICH@...quentis.com,
	f.fainelli@...il.com, olteanv@...il.com, davem@...emloft.net,
	edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
	linux-kernel@...r.kernel.org, upstream+netdev@...ma-star.at
Subject: Re: [PATCH v2] net: dsa: mv88e6xxx: propperly shutdown PPU re-enable
 timer on destroy

On Tue, Dec 03, 2024 at 03:43:40PM +0100, David Oberhollenzer wrote:
> The mv88e6xxx has an internal PPU that polls PHY state. If we want to
> access the internal PHYs, we need to disable it. Because enable/disable
> of the PPU is a slow operation, a 10ms timer is used to re-enable it,
> canceled with every access, so bulk operations effectively only disable
> it once and re-enable it some 10ms after the last access.
> 
> If a PHY is accessed and then the mv88e6xxx module is removed before
> the 10ms are up, the PPU re-enable ends up accessing a dangling pointer.
> 
> This especially affects probing during bootup. The MDIO bus and PHY
> registration may succeed, but registration with the DSA framework
> may fail later on (e.g. because the CPU port depends on another,
> very slow device that isn't done probing yet, returning -EPROBE_DEFER).
> In this case, probe() fails, but the MDIO subsystem may already have
> accessed the MIDO bus or PHYs, arming timer.
> 
> This is fixed as follows:
>  - If probe fails after mv88e6xxx_phy_init(), make sure we also call
>    mv88e6xxx_phy_destroy() before returning
>  - In mv88e6xxx_phy_destroy(), grab the ppu_mutex to make sure the work
>    function either has already exited, or (should it run) cannot do
>    anything, fails to grab the mutex and returns.

On first reading this, i did not understand the code is using
mutex_trylock() which made me think it could deadlock. Maybe change
this to "mutex_trylock() fails to get the mutex and returns.

But i'm not actually sure this is needed. There are plenty of other
examples of destroying a work which does not take a mutex.

>  - In addition to destroying the timer, also destroy the work item, in
>    case the timer has already fired.
>  - Do all of this synchronously, to make sure timer & work item are
>    destroyed and none of the callbacks are running.

This is the important part, doing it synchronously. cancel_work_sync()
should be enough.

>  static void mv88e6xxx_phy_ppu_state_destroy(struct mv88e6xxx_chip *chip)
>  {
> +	mutex_lock(&chip->ppu_mutex);
>  	del_timer_sync(&chip->ppu_timer);
> +	cancel_work_sync(&chip->ppu_work);
> +	mutex_unlock(&chip->ppu_mutex);
>  }

/**
 * del_timer_sync - Delete a pending timer and wait for a running callback
 * @timer:	The timer to be deleted
 *
 * See timer_delete_sync() for detailed explanation.
 *
 * Do not use in new code. Use timer_delete_sync() instead.


    Andrew

---
pw-bot: cr