lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <435b2a9d-c3c6-a162-331f-9f47f69be5ac@gmail.com>
Date:   Sun, 6 Mar 2022 13:56:18 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Erico Nunes <nunes.erico@...il.com>
Cc:     Jerome Brunet <jbrunet@...libre.com>,
        Martin Blumenstingl <martin.blumenstingl@...glemail.com>,
        Alexandre Torgue <alexandre.torgue@...s.st.com>,
        Giuseppe Cavallaro <peppe.cavallaro@...com>,
        Jose Abreu <joabreu@...opsys.com>,
        Kevin Hilman <khilman@...libre.com>,
        Neil Armstrong <narmstrong@...libre.com>,
        linux-amlogic@...ts.infradead.org, netdev@...r.kernel.org,
        "open list:ARM/Rockchip SoC..." <linux-rockchip@...ts.infradead.org>,
        linux-sunxi@...ts.linux.dev
Subject: Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up
 at boot

On 06.03.2022 10:40, Erico Nunes wrote:
> On Wed, Mar 2, 2022 at 5:35 PM Heiner Kallweit <hkallweit1@...il.com> wrote:
>> When using polling the time difference between aneg complete and
>> PHY state machine run is random in the interval 0 .. 1s.
>> Hence there's a certain chance that the difference is too small
>> to avoid the issue.
>>
>>> If I understand the proposed patch correctly, it is mostly about the phy
>>> IRQ. Since I reproduce without the IRQ, I suppose it is not the
>>> problem we where looking for (might still be a problem worth fixing -
>>> the phy is not "rock-solid" when it comes to aneg - I already tried
>>> stabilising it a few years ago)
>>
>> Below is a slightly improved version of the test patch. It doesn't sleep
>> in the (threaded) interrupt handler and lets the workqueue do it.
>>
>> Maybe Amlogic is aware of a potentially related silicon issue?
>>
>>>
>>> TBH, It bothers me that I reproduced w/o the IRQ. The idea makes
>>> sense :/
>>>
>>>>
>> [...]
>>>
>>
>>
>> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
>> index 7e7904fee..a3318ae01 100644
>> --- a/drivers/net/phy/meson-gxl.c
>> +++ b/drivers/net/phy/meson-gxl.c
>> @@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev)
>>                 if (ret)
>>                         return ret;
>>
>> -               val = INTSRC_ANEG_PR
>> -                       | INTSRC_PARALLEL_FAULT
>> -                       | INTSRC_ANEG_LP_ACK
>> -                       | INTSRC_LINK_DOWN
>> -                       | INTSRC_REMOTE_FAULT
>> -                       | INTSRC_ANEG_COMPLETE;
>> +               val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE;
>>                 ret = phy_write(phydev, INTSRC_MASK, val);
>>         } else {
>>                 val = 0;
>> @@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
>>         if (irq_status == 0)
>>                 return IRQ_NONE;
>>
>> -       phy_trigger_machine(phydev);
>> +       if (irq_status & INTSRC_ANEG_COMPLETE)
>> +               phy_queue_state_machine(phydev, msecs_to_jiffies(100));
>> +       else
>> +               phy_trigger_machine(phydev);
>>
>>         return IRQ_HANDLED;
>>  }
>> --
>> 2.35.1
> 
> I did a lot of testing with this patch, and it seems to improve things.
> To me it completely resolves the original issue which was more easily
> reproducible where I would see "Link is Up" but the interface did not
> really work.
> At least in over a thousand jobs, that never reproduced again with this patch.
> 
> I do see a different issue now, but it is even less frequent and
> harder to reproduce. In those over a thousand jobs, I have seen it
> only about 4 times.
> The difference is that now when the issue happens, the link is not
> even reported as Up. The output is a bit different than the original
> one, but it is consistently the same output in all instances where it
> reproduced. Looks like this (note that there is no longer Link is
> Down/Link is Up):
> 
> [    2.186151] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [    2.191582] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [    2.208713] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [    2.210673] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [    2.218083] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [   22.227444] Waiting up to 100 more seconds for network.
> [   42.231440] Waiting up to 80 more seconds for network.
> [   62.235437] Waiting up to 60 more seconds for network.
> [   82.239437] Waiting up to 40 more seconds for network.
> [  102.243439] Waiting up to 20 more seconds for network.
> [  122.243446] Sending DHCP requests ...
> [  130.113944] random: fast init done
> [  134.219441] ... timed out!
> [  194.559562] IP-Config: Retrying forever (NFS root)...
> [  194.624630] meson8b-dwmac c9410000.ethernet eth0: PHY
> [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48)
> [  194.630739] meson8b-dwmac c9410000.ethernet eth0: Register
> MEM_TYPE_PAGE_POOL RxQ-0
> [  194.649138] meson8b-dwmac c9410000.ethernet eth0: No Safety
> Features support found
> [  194.651113] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
> [  194.657931] meson8b-dwmac c9410000.ethernet eth0: configuring for
> phy/rmii link mode
> [  196.313602] meson8b-dwmac c9410000.ethernet eth0: Link is Up -
> 100Mbps/Full - flow control off
> [  196.339463] Sending DHCP requests ., OK
> ...
> 
> 
> I don't remember seeing an output like this one in the previous tests.
> Is there any further improvement we can do to the patch based on this?
> 
> Thanks
> 
> Erico

Thanks a lot for your testing efforts, much appreciated.
You could try the following (quick and dirty) test patch that fully mimics
the vendor driver as found here:
https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c

First apply
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563
This patch is in the net tree currently and should show up in linux-next
beginning of the week.

On top please apply the following (it includes the test patch your working with).


diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
index c49062ad7..92f94c8be 100644
--- a/drivers/net/phy/meson-gxl.c
+++ b/drivers/net/phy/meson-gxl.c
@@ -68,32 +68,19 @@ static int meson_gxl_open_banks(struct phy_device *phydev)
 	return phy_write(phydev, TSTCNTL, TSTCNTL_TEST_MODE);
 }
 
-static void meson_gxl_close_banks(struct phy_device *phydev)
-{
-	phy_write(phydev, TSTCNTL, 0);
-}
-
 static int meson_gxl_read_reg(struct phy_device *phydev,
 			      unsigned int bank, unsigned int reg)
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTCNTL, TSTCNTL_READ |
 			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
 			TSTCNTL_TEST_MODE |
 			FIELD_PREP(TSTCNTL_READ_ADDRESS, reg));
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_read(phydev, TSTREAD1);
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
+	return phy_read(phydev, TSTREAD1);
 }
 
 static int meson_gxl_write_reg(struct phy_device *phydev,
@@ -102,29 +89,28 @@ static int meson_gxl_write_reg(struct phy_device *phydev,
 {
 	int ret;
 
-	ret = meson_gxl_open_banks(phydev);
-	if (ret)
-		goto out;
-
 	ret = phy_write(phydev, TSTWRITE, value);
 	if (ret)
-		goto out;
+		return ret;
 
-	ret = phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
-			FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
-			TSTCNTL_TEST_MODE |
-			FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
+	return phy_write(phydev, TSTCNTL, TSTCNTL_WRITE |
+			 FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) |
+			 TSTCNTL_TEST_MODE |
+			 FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg));
 
-out:
-	/* Close the bank access on our way out */
-	meson_gxl_close_banks(phydev);
-	return ret;
 }
 
 static int meson_gxl_config_init(struct phy_device *phydev)
 {
 	int ret;
 
+	phy_set_bits(phydev, 0x1b, BIT(12));
+	phy_write(phydev, 0x11, 0x0080);
+
+	meson_gxl_open_banks(phydev);
+
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x8e0d);
+
 	/* Enable fractional PLL */
 	ret = meson_gxl_write_reg(phydev, BANK_BIST, FR_PLL_CONTROL, 0x5);
 	if (ret)
@@ -140,6 +126,10 @@ static int meson_gxl_config_init(struct phy_device *phydev)
 	if (ret)
 		return ret;
 
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x18, 0x000c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x1a0c);
+	ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x1a, 0x6400);
+
 	return 0;
 }
 
@@ -186,7 +176,7 @@ static int meson_gxl_read_status(struct phy_device *phydev)
 		if (!(wol & LPI_STATUS_RSV12) ||
 		    ((exp & EXPANSION_NWAY) && !(lpa & LPA_LPACK))) {
 			/* Looks like aneg failed after all */
-			phydev_dbg(phydev, "LPA corruption - aneg restart\n");
+			phydev_warn(phydev, "LPA corruption - aneg restart\n");
 			return genphy_restart_aneg(phydev);
 		}
 	}
@@ -243,11 +233,23 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev)
 	    irq_status == INTSRC_ENERGY_DETECT)
 		return IRQ_HANDLED;
 
-	phy_trigger_machine(phydev);
+	/* Give PHY some time before MAC starts sending data. This works
+	 * around an issue where network doesn't come up properly.
+	 */
+	if (irq_status & INTSRC_ANEG_COMPLETE)
+		phy_queue_state_machine(phydev, msecs_to_jiffies(100));
+	else
+		phy_trigger_machine(phydev);
 
 	return IRQ_HANDLED;
 }
 
+static void meson_gxl_link_change_notify(struct phy_device *phydev)
+{
+	if (phydev->state == PHY_RUNNING && phydev->speed == SPEED_100)
+		meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x14, 0xa900);
+}
+
 static struct phy_driver meson_gxl_phy[] = {
 	{
 		PHY_ID_MATCH_EXACT(0x01814400),
@@ -259,6 +261,7 @@ static struct phy_driver meson_gxl_phy[] = {
 		.read_status	= meson_gxl_read_status,
 		.config_intr	= meson_gxl_config_intr,
 		.handle_interrupt = meson_gxl_handle_interrupt,
+		.link_change_notify = meson_gxl_link_change_notify,
 		.suspend        = genphy_suspend,
 		.resume         = genphy_resume,
 	}, {
-- 
2.35.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ