[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <435b2a9d-c3c6-a162-331f-9f47f69be5ac@gmail.com>
Date: Sun, 6 Mar 2022 13:56:18 +0100
From: Heiner Kallweit <hkallweit1@...il.com>
To: Erico Nunes <nunes.erico@...il.com>
Cc: Jerome Brunet <jbrunet@...libre.com>,
Martin Blumenstingl <martin.blumenstingl@...glemail.com>,
Alexandre Torgue <alexandre.torgue@...s.st.com>,
Giuseppe Cavallaro <peppe.cavallaro@...com>,
Jose Abreu <joabreu@...opsys.com>,
Kevin Hilman <khilman@...libre.com>,
Neil Armstrong <narmstrong@...libre.com>,
linux-amlogic@...ts.infradead.org, netdev@...r.kernel.org,
"open list:ARM/Rockchip SoC..." <linux-rockchip@...ts.infradead.org>,
linux-sunxi@...ts.linux.dev
Subject: Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up
at boot
On 06.03.2022 10:40, Erico Nunes wrote:
> On Wed, Mar 2, 2022 at 5:35 PM Heiner Kallweit <hkallweit1@...il.com> wrote:
>> When using polling the time difference between aneg complete and
>> PHY state machine run is random in the interval 0 .. 1s.
>> Hence there's a certain chance that the difference is too small
>> to avoid the issue.
>>
>>> If I understand the proposed patch correctly, it is mostly about the phy
>>> IRQ. Since I reproduce without the IRQ, I suppose it is not the
>>> problem we where looking for (might still be a problem worth fixing -
>>> the phy is not "rock-solid" when it comes to aneg - I already tried
>>> stabilising it a few years ago)
>>
>> Below is a slightly improved version of the test patch. It doesn't sleep
>> in the (threaded) interrupt handler and lets the workqueue do it.
>>
>> Maybe Amlogic is aware of a potentially related silicon issue?
>>
>>>
>>> TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
>>> sense :/
>>>
>>>>
>> [...]
>>>
>>
>>
>> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
>> index 7e7904fee..a3318ae01 100644
>> --- a/drivers/net/phy/meson-gxl.c
>> +++ b/drivers/net/phy/meson-gxl.c
>> @@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>> if (ret)
>> return ret;
>>
>> - val = INTSRC_ANEG_PR
>> - | INTSRC_PARALLEL_FAULT
>> - | INTSRC_ANEG_LP_ACK
>> - | INTSRC_LINK_DOWN
>> - | INTSRC_REMOTE_FAULT
>> - | INTSRC_ANEG_COMPLETE;
>> + val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>> ret = phy_write(phydev, INTSRC_MASK, val);
>> } else {
>> val = 0;
>> @@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>> if (irq_status == 0)
>> return IRQ_NONE;
>>
>> - phy_trigger_machine(phydev);
>> + if (irq_status & INTSRC_ANEG_COMPLETE)
>> + phy_queue_state_machine(phydev, msecs_to_jiffies(100));
>> + else
>> + phy_trigger_machine(phydev);
>>
>> return IRQ_HANDLED;
>> }
>> --
>> 2.35.1
>
> I did a lot of testing with this patch, and it seems to improve things.
> To me it completely resolves the original issue which was more easily
> reproducible where I would see "Link is Up" but the interface did not
> really work.
> At least in over a thousand jobs, that never reproduced again with this patch.
>
> I do see a different issue now, but it is even less frequent and
> harder to reproduce. In those over a thousand jobs, I have seen it
> only about 4 times.
> The difference is that now when the issue happens, the link is not
> even reported as Up. The output is a bit different than the original
> one, but it is consistently the same output in all instances where it
> reproduced. Looks like this (note that there is no longer Link is
> Down/Link is Up):
>
> [ 2.186151] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [ 2.191582] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [ 2.208713] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [ 2.210673] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [ 2.218083] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [ 22.227444] Waiting up to 100 more seconds for network.
> [ 42.231440] Waiting up to 80 more seconds for network.
> [ 62.235437] Waiting up to 60 more seconds for network.
> [ 82.239437] Waiting up to 40 more seconds for network.
> [ 102.243439] Waiting up to 20 more seconds for network.
> [ 122.243446] Sending DHCP requests ...
> [ 130.113944] random: fast init done
> [ 134.219441] ... timed out!
> [ 194.559562] IP-Config: Retrying forever (NFS root)...
> [ 194.624630] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [ 194.630739] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [ 194.649138] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [ 194.651113] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [ 194.657931] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [ 196.313602] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
> [ 196.339463] Sending DHCP requests ., OK
> ...
>
>
> I don't remember seeing an output like this one in the previous tests.
> Is there any further improvement we can do to the patch based on this?
>
> Thanks
>
> Erico
Thanks a lot for your testing efforts, much appreciated.
You could try the following (quick and dirty) test patch that fully mimics
the vendor driver as found here:
https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c
First apply
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
This patch is in the net tree currently and should show up in linux-next
beginning of the week.
On top please apply the following (it includes the test patch your working with).
diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index c49062ad7..92f94c8be 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -68,32 +68,19 @@ static int meson_gxl_open_banks(struct phy_device *phydev)
return phy_write(phydev, TSTCNTL, TSTCNTL_TEST_MODE);
}
-static void meson_gxl_close_banks(struct phy_device *phydev)
-{
- phy_write(phydev, TSTCNTL, 0);
-}
-
static int meson_gxl_read_reg(struct phy_device *phydev,
unsigned int bank, unsigned int reg)
{
int ret;
- ret = meson_gxl_open_banks(phydev);
- if (ret)
- goto out;
-
ret = phy_write(phydev, TSTCNTL, TSTCNTL_READ |
FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
TSTCNTL_TEST_MODE |
FIELD_PREP(TSTCNTL_READ_ADDRESS, reg));
if (ret)
- goto out;
+ return ret;
- ret = phy_read(phydev, TSTREAD1);
-out:
- /* Close the bank access on our way out */
- meson_gxl_close_banks(phydev);
- return ret;
+ return phy_read(phydev, TSTREAD1);
}
static int meson_gxl_write_reg(struct phy_device *phydev,
@@ -102,29 +89,28 @@ static int meson_gxl_write_reg(struct phy_device *phydev,
{
int ret;
- ret = meson_gxl_open_banks(phydev);
- if (ret)
- goto out;
-
ret = phy_write(phydev, TSTWRITE, value);
if (ret)
- goto out;
+ return ret;
- ret = phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
- FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
- TSTCNTL_TEST_MODE |
- FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
+ return phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
+ FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
+ TSTCNTL_TEST_MODE |
+ FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
-out:
- /* Close the bank access on our way out */
- meson_gxl_close_banks(phydev);
- return ret;
}
static int meson_gxl_config_init(struct phy_device *phydev)
{
int ret;
+ phy_set_bits(phydev, 0x1b, BIT(12));
+ phy_write(phydev, 0x11, 0x0080);
+
+ meson_gxl_open_banks(phydev);
+
+ ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x8e0d);
+
/* Enable fractional PLL */
ret = meson_gxl_write_reg(phydev, BANK_BIST, FR_PLL_CONTROL, 0x5);
if (ret)
@@ -140,6 +126,10 @@ static int meson_gxl_config_init(struct phy_device *phydev)
if (ret)
return ret;
+ ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x18, 0x000c);
+ ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x1a0c);
+ ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x1a, 0x6400);
+
return 0;
}
@@ -186,7 +176,7 @@ static int meson_gxl_read_status(struct phy_device *phydev)
if (!(wol & LPI_STATUS_RSV12) ||
((exp & EXPANSION_NWAY) && !(lpa & LPA_LPACK))) {
/* Looks like aneg failed after all */
- phydev_dbg(phydev, "LPA corruption - aneg restart\n");
+ phydev_warn(phydev, "LPA corruption - aneg restart\n");
return genphy_restart_aneg(phydev);
}
}
@@ -243,11 +233,23 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
irq_status == INTSRC_ENERGY_DETECT)
return IRQ_HANDLED;
- phy_trigger_machine(phydev);
+ /* Give PHY some time before MAC starts sending data. This works
+ * around an issue where network doesn't come up properly.
+ */
+ if (irq_status & INTSRC_ANEG_COMPLETE)
+ phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+ else
+ phy_trigger_machine(phydev);
return IRQ_HANDLED;
}
+static void meson_gxl_link_change_notify(struct phy_device *phydev)
+{
+ if (phydev->state == PHY_RUNNING && phydev->speed == SPEED_100)
+ meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x14, 0xa900);
+}
+
static struct phy_driver meson_gxl_phy[] = {
{
PHY_ID_MATCH_EXACT(0x01814400),
@@ -259,6 +261,7 @@ static struct phy_driver meson_gxl_phy[] = {
.read_status = meson_gxl_read_status,
.config_intr = meson_gxl_config_intr,
.handle_interrupt = meson_gxl_handle_interrupt,
+ .link_change_notify = meson_gxl_link_change_notify,
.suspend = genphy_suspend,
.resume = genphy_resume,
}, {
--
2.35.1
Powered by blists - more mailing lists