[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <91e2d4ad-7544-784b-defe-3a76577462f1@gmail.com>
Date: Wed, 23 Feb 2022 09:55:29 -0800
From: Florian Fainelli <f.fainelli@...il.com>
To: Heyi Guo <guoheyi@...ux.alibaba.com>, linux-kernel@...r.kernel.org
Cc: Andrew Lunn <andrew@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Joel Stanley <joel@....id.au>,
Guangbin Huang <huangguangbin2@...wei.com>,
Hao Chen <chenhao288@...ilicon.com>,
Arnd Bergmann <arnd@...db.de>,
Dylan Hung <dylan_hung@...eedtech.com>, netdev@...r.kernel.org
Subject: Re: [PATCH 3/3] drivers/net/ftgmac100: fix DHCP potential failure
with systemd
On 2/23/2022 3:39 AM, Heyi Guo wrote:
> Hi Florian,
>
> 在 2022/2/23 下午1:00, Florian Fainelli 写道:
>>
>>
>> On 2/22/2022 7:14 PM, Heyi Guo wrote:
>>> DHCP failures were observed with systemd 247.6. The issue could be
>>> reproduced by rebooting Aspeed 2600 and then running ifconfig ethX
>>> down/up.
>>>
>>> It is caused by below procedures in the driver:
>>>
>>> 1. ftgmac100_open() enables net interface and call phy_start()
>>> 2. When PHY is link up, it calls netif_carrier_on() and then
>>> adjust_link callback
>>> 3. ftgmac100_adjust_link() will schedule the reset task
>>> 4. ftgmac100_reset_task() will then reset the MAC in another schedule
>>>
>>> After step 2, systemd will be notified to send DHCP discover packet,
>>> while the packet might be corrupted by MAC reset operation in step 4.
>>>
>>> Call ftgmac100_reset() directly instead of scheduling task to fix the
>>> issue.
>>>
>>> Signed-off-by: Heyi Guo <guoheyi@...ux.alibaba.com>
>>> ---
>>> Cc: Andrew Lunn <andrew@...n.ch>
>>> Cc: "David S. Miller" <davem@...emloft.net>
>>> Cc: Jakub Kicinski <kuba@...nel.org>
>>> Cc: Joel Stanley <joel@....id.au>
>>> Cc: Guangbin Huang <huangguangbin2@...wei.com>
>>> Cc: Hao Chen <chenhao288@...ilicon.com>
>>> Cc: Arnd Bergmann <arnd@...db.de>
>>> Cc: Dylan Hung <dylan_hung@...eedtech.com>
>>> Cc: netdev@...r.kernel.org
>>>
>>>
>>> ---
>>> drivers/net/ethernet/faraday/ftgmac100.c | 13 +++++++++++--
>>> 1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
>>> b/drivers/net/ethernet/faraday/ftgmac100.c
>>> index c1deb6e5d26c5..d5356db7539a4 100644
>>> --- a/drivers/net/ethernet/faraday/ftgmac100.c
>>> +++ b/drivers/net/ethernet/faraday/ftgmac100.c
>>> @@ -1402,8 +1402,17 @@ static void ftgmac100_adjust_link(struct
>>> net_device *netdev)
>>> /* Disable all interrupts */
>>> iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
>>> - /* Reset the adapter asynchronously */
>>> - schedule_work(&priv->reset_task);
>>> + /* Release phy lock to allow ftgmac100_reset to aquire it,
>>> keeping lock
>>
>> typo: acquire
>>
> Thanks for the catch :)
>>> + * order consistent to prevent dead lock.
>>> + */
>>> + if (netdev->phydev)
>>> + mutex_unlock(&netdev->phydev->lock);
>>> +
>>> + ftgmac100_reset(priv);
>>> +
>>> + if (netdev->phydev)
>>> + mutex_lock(&netdev->phydev->lock);
>>
>> Do you really need to perform a full MAC reset whenever the link goes
>> up or down? Instead cannot you just extract the maccr configuration
>> which adjusts the speed and be done with it?
>
> This is the original behavior and not changed in this patch set, and I'm
> not familiar with the hardware design of ftgmac100, so I'd like to limit
> the changes to the code which really causes practical issues.
This unlocking and re-locking seems superfluous when you could introduce
a version of ftgmac100_reset() which does not acquire the PHY device
mutex, and have that version called from ftgmac100_adjust_link(). For
every other call site, you would acquire it. Something like this for
instance:
diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
b/drivers/net/ethernet/faraday/ftgmac100.c
index 691605c15265..98179c3fd9ee 100644
--- a/drivers/net/ethernet/faraday/ftgmac100.c
+++ b/drivers/net/ethernet/faraday/ftgmac100.c
@@ -1038,7 +1038,7 @@ static void ftgmac100_adjust_link(struct
net_device *netdev)
iowrite32(0, priv->base + FTGMAC100_OFFSET_IER);
/* Reset the adapter asynchronously */
- schedule_work(&priv->reset_task);
+ ftgmac100_reset(priv, false);
}
static int ftgmac100_mii_probe(struct net_device *netdev)
@@ -1410,10 +1410,8 @@ static int ftgmac100_init_all(struct ftgmac100
*priv, bool ignore_alloc_err)
return err;
}
-static void ftgmac100_reset_task(struct work_struct *work)
+static void ftgmac100_reset_task(struct ftgmac100_priv *priv, bool
lock_phy)
{
- struct ftgmac100 *priv = container_of(work, struct ftgmac100,
- reset_task);
struct net_device *netdev = priv->netdev;
int err;
@@ -1421,7 +1419,7 @@ static void ftgmac100_reset_task(struct
work_struct *work)
/* Lock the world */
rtnl_lock();
- if (netdev->phydev)
+ if (netdev->phydev && lock_phy)
mutex_lock(&netdev->phydev->lock);
if (priv->mii_bus)
mutex_lock(&priv->mii_bus->mdio_lock);
@@ -1454,11 +1452,19 @@ static void ftgmac100_reset_task(struct
work_struct *work)
bail:
if (priv->mii_bus)
mutex_unlock(&priv->mii_bus->mdio_lock);
- if (netdev->phydev)
+ if (netdev->phydev && lock_phy)
mutex_unlock(&netdev->phydev->lock);
rtnl_unlock();
}
+static void ftgmac100_reset_task(struct work_struct *work)
+{
+ struct ftgmac100 *priv = container_of(work, struct ftgmac100,
+ reset_task);
+
+ ftgmac100_reset(priv, true);
+}
+
static int ftgmac100_open(struct net_device *netdev)
{
struct ftgmac100 *priv = netdev_priv(netdev)
--
Florian
Powered by blists - more mailing lists