lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fdeb3a11-416a-4043-9eb4-fff225e448d9@lunn.ch>
Date: Mon, 15 Sep 2025 15:26:55 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Zhang Jian <zhangjian.3032@...edance.com>
Cc: Jacky Chou <jacky_chou@...eedtech.com>, netdev@...r.kernel.org,
	davem@...emloft.net, andrew+netdev@...n.ch,
	guoheyi@...ux.alibaba.com, Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Simon Horman <horms@...nel.org>,
	Heiner Kallweit <hkallweit1@...il.com>,
	Uwe Kleine-König <u.kleine-koenig@...libre.com>,
	Bjorn Helgaas <bhelgaas@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH 1/1] Revert "drivers/net/ftgmac100: fix
 DHCP potential failure with systemd"

> > > This reverts commit 1baf2e50e48f10f0ea07d53e13381fd0da1546d2.

    DHCP failures were observed with systemd 247.6. The issue could be
    reproduced by rebooting Aspeed 2600 and then running ifconfig ethX
    down/up.
    
    It is caused by below procedures in the driver:
    
    1. ftgmac100_open() enables net interface and call phy_start()
    2. When PHY is link up, it calls netif_carrier_on() and then
    adjust_link callback
    3. ftgmac100_adjust_link() will schedule the reset task
    4. ftgmac100_reset_task() will then reset the MAC in another schedule
    
    After step 2, systemd will be notified to send DHCP discover packet,
    while the packet might be corrupted by MAC reset operation in step 4.

We might be able to solve this issue in a different way.

> > > * the PHY state_queue is triggered and calls ftgmac100_adjust_link
> > > -     /* Release phy lock to allow ftgmac100_reset to acquire it, keeping lock
> > > -      * order consistent to prevent dead lock.
> > > -      */
> > > -     if (netdev->phydev)
> > > -             mutex_unlock(&netdev->phydev->lock);
> > > -
> > > -     ftgmac100_reset(priv);
> > > -
> > > -     if (netdev->phydev)
> > > -             mutex_lock(&netdev->phydev->lock);
> > > -
> > > +     /* Reset the adapter asynchronously */
> > > +     schedule_work(&priv->reset_task);

Before scheduling the work, call netif_carrier_off().  At the end of
ftgmac100_reset(), turn the carrier back on again.

That carrier off/on will probably trigger systemd to restart the dhcp
client. Not great, but better than nothing.

I think the real fix however is to minimise ftgmac100_reset() to just
what is needed, and see if rtnl is really needed. It is the race with
unlocking phydev, and then relocking it after taking rtnl which is the
problem.

	Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ