[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMeyCbiYwB=SK3vvqdTEWhbnHwee8U6rfxzNs9B8-hyr45GhOw@mail.gmail.com>
Date: Wed, 22 Dec 2021 12:05:38 +0100
From: Kegl Rohit <keglrohit@...il.com>
To: Joakim Zhang <qiangqing.zhang@....com>
Cc: netdev <netdev@...r.kernel.org>, Andrew Lunn <andrew@...n.ch>
Subject: Re: net: fec: memory corruption caused by err_enet_mii_probe error path
On Wed, Dec 22, 2021 at 9:01 AM Joakim Zhang <qiangqing.zhang@....com> wrote:
> > This error path frees the DMA buffers, BUT as far I could see it does not stop
> > the DMA engines.
> > => open() fails => frees buffers => DMA still active => MAC receives network
> > packet => DMA starts => random memory corruption (use after
> > free) => random kernel panics
>
> A question here, why receive path still active? MAC has not connected to PHY when this failure happened, should not see network activities.
It is a imx.28 platform using the fec for dual ethernet eth0 & eth1.
One of our devices (out of 10) eth1 did not detect a phy for eth1 on
ifup ( fec_open() ) => mdio error path => random system crashes
But the phy is there and the link is good, only MDIO access failed.
phys have autoneg activated after reset. So the RX path is active,
even if the fec driver says that there is no phy.
Without attached ethernet cable the phy was also not detected, BUT the
system did not crash. => Because no packets will arrive without
attached cable.
So the main issue on our side is the not detected phy. And the other
issue is the use after free in the error path.
I think the main issue has something to do with phy reset handling.
On a cold boot the eth1 phy is detected successfully. A warm restart (
reboot -f ) will always lead to a not detected eth1 phy. The eth0 phy
is always detected.
>From past experience the dual ethernet implementation in the driver
was not the most stable one. Maybe because of a smaller user base.
In our setup both phy reset lines are connected to gpios with the
correct entry in the mac0 & mac1 DT entry.
Revert of https://github.com/torvalds/linux/commit/7705b5ed8adccd921423019e870df672fa423279#diff-655f306656e7bccbec8fe6ebff2adf466bb8133f5bcb551112d3fe37e50b1a15
seems to get the phys in a correct state before the fec driver is
probed and both phys are detected.
> > So maybe fec_stop() as counterpart to fec_restart() is missing before freeing
> > the buffers?
> > err_enet_mii_probe:
> > fec_stop(ndev);
> > fec_enet_free_buffers(ndev);
> >
> > Issue happend with 5.10.83 and should also also happen with current master.
>
> It's fine for me, please see if anyone else has some comments. If not, please cook a formal patch, thanks.
So fec_stop is the right guess to stop the rx/tx dma rings.
Other paths use netif_tx_lock_bh before calling fec_stop().
I don't think this is necessary because netif_tx_start_all_queues()
was not called before in this error path case?
https://github.com/torvalds/linux/blob/master/drivers/net/ethernet/freescale/fec_main.c#L3207
index 1b1f7f2a6..8f208b4a9 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3211,8 +3211,9 @@ fec_enet_open(struct net_device *ndev)
return 0;
err_enet_mii_probe:
+ fec_stop(ndev);
fec_enet_free_buffers(ndev);
err_enet_alloc:
fec_enet_clk_enable(ndev, false);
clk_enable:
Powered by blists - more mailing lists