lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 6 Mar 2022 13:56:18 +0100 From: Heiner Kallweit <hkallweit1@...il.com> To: Erico Nunes <nunes.erico@...il.com> Cc: Jerome Brunet <jbrunet@...libre.com>, Martin Blumenstingl <martin.blumenstingl@...glemail.com>, Alexandre Torgue <alexandre.torgue@...s.st.com>, Giuseppe Cavallaro <peppe.cavallaro@...com>, Jose Abreu <joabreu@...opsys.com>, Kevin Hilman <khilman@...libre.com>, Neil Armstrong <narmstrong@...libre.com>, linux-amlogic@...ts.infradead.org, netdev@...r.kernel.org, "open list:ARM/Rockchip SoC..." <linux-rockchip@...ts.infradead.org>, linux-sunxi@...ts.linux.dev Subject: Re: net: stmmac: dwmac-meson8b: interface sometimes does not come up at boot On 06.03.2022 10:40, Erico Nunes wrote: > On Wed, Mar 2, 2022 at 5:35 PM Heiner Kallweit <hkallweit1@...il.com> wrote: >> When using polling the time difference between aneg complete and >> PHY state machine run is random in the interval 0 .. 1s. >> Hence there's a certain chance that the difference is too small >> to avoid the issue. >> >>> If I understand the proposed patch correctly, it is mostly about the phy >>> IRQ. Since I reproduce without the IRQ, I suppose it is not the >>> problem we where looking for (might still be a problem worth fixing - >>> the phy is not "rock-solid" when it comes to aneg - I already tried >>> stabilising it a few years ago) >> >> Below is a slightly improved version of the test patch. It doesn't sleep >> in the (threaded) interrupt handler and lets the workqueue do it. >> >> Maybe Amlogic is aware of a potentially related silicon issue? >> >>> >>> TBH, It bothers me that I reproduced w/o the IRQ. The idea makes >>> sense :/ >>> >>>> >> [...] >>> >> >> >> diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c >> index 7e7904fee..a3318ae01 100644 >> --- a/drivers/net/phy/meson-gxl.c >> +++ b/drivers/net/phy/meson-gxl.c >> @@ -209,12 +209,7 @@ static int meson_gxl_config_intr(struct phy_device *phydev) >> if (ret) >> return ret; >> >> - val = INTSRC_ANEG_PR >> - | INTSRC_PARALLEL_FAULT >> - | INTSRC_ANEG_LP_ACK >> - | INTSRC_LINK_DOWN >> - | INTSRC_REMOTE_FAULT >> - | INTSRC_ANEG_COMPLETE; >> + val = INTSRC_LINK_DOWN | INTSRC_ANEG_COMPLETE; >> ret = phy_write(phydev, INTSRC_MASK, val); >> } else { >> val = 0; >> @@ -240,7 +235,10 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev) >> if (irq_status == 0) >> return IRQ_NONE; >> >> - phy_trigger_machine(phydev); >> + if (irq_status & INTSRC_ANEG_COMPLETE) >> + phy_queue_state_machine(phydev, msecs_to_jiffies(100)); >> + else >> + phy_trigger_machine(phydev); >> >> return IRQ_HANDLED; >> } >> -- >> 2.35.1 > > I did a lot of testing with this patch, and it seems to improve things. > To me it completely resolves the original issue which was more easily > reproducible where I would see "Link is Up" but the interface did not > really work. > At least in over a thousand jobs, that never reproduced again with this patch. > > I do see a different issue now, but it is even less frequent and > harder to reproduce. In those over a thousand jobs, I have seen it > only about 4 times. > The difference is that now when the issue happens, the link is not > even reported as Up. The output is a bit different than the original > one, but it is consistently the same output in all instances where it > reproduced. Looks like this (note that there is no longer Link is > Down/Link is Up): > > [ 2.186151] meson8b-dwmac c9410000.ethernet eth0: PHY > [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48) > [ 2.191582] meson8b-dwmac c9410000.ethernet eth0: Register > MEM_TYPE_PAGE_POOL RxQ-0 > [ 2.208713] meson8b-dwmac c9410000.ethernet eth0: No Safety > Features support found > [ 2.210673] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW > [ 2.218083] meson8b-dwmac c9410000.ethernet eth0: configuring for > phy/rmii link mode > [ 22.227444] Waiting up to 100 more seconds for network. > [ 42.231440] Waiting up to 80 more seconds for network. > [ 62.235437] Waiting up to 60 more seconds for network. > [ 82.239437] Waiting up to 40 more seconds for network. > [ 102.243439] Waiting up to 20 more seconds for network. > [ 122.243446] Sending DHCP requests ... > [ 130.113944] random: fast init done > [ 134.219441] ... timed out! > [ 194.559562] IP-Config: Retrying forever (NFS root)... > [ 194.624630] meson8b-dwmac c9410000.ethernet eth0: PHY > [0.e40908ff:08] driver [Meson GXL Internal PHY] (irq=48) > [ 194.630739] meson8b-dwmac c9410000.ethernet eth0: Register > MEM_TYPE_PAGE_POOL RxQ-0 > [ 194.649138] meson8b-dwmac c9410000.ethernet eth0: No Safety > Features support found > [ 194.651113] meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW > [ 194.657931] meson8b-dwmac c9410000.ethernet eth0: configuring for > phy/rmii link mode > [ 196.313602] meson8b-dwmac c9410000.ethernet eth0: Link is Up - > 100Mbps/Full - flow control off > [ 196.339463] Sending DHCP requests ., OK > ... > > > I don't remember seeing an output like this one in the previous tests. > Is there any further improvement we can do to the patch based on this? > > Thanks > > Erico Thanks a lot for your testing efforts, much appreciated. You could try the following (quick and dirty) test patch that fully mimics the vendor driver as found here: https://github.com/khadas/linux/blob/buildroot-aml-4.9/drivers/amlogic/ethernet/phy/amlogic.c First apply https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a502a8f04097e038c3daa16c5202a9538116d563 This patch is in the net tree currently and should show up in linux-next beginning of the week. On top please apply the following (it includes the test patch your working with). diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c index c49062ad7..92f94c8be 100644 --- a/drivers/net/phy/meson-gxl.c +++ b/drivers/net/phy/meson-gxl.c @@ -68,32 +68,19 @@ static int meson_gxl_open_banks(struct phy_device *phydev) return phy_write(phydev, TSTCNTL, TSTCNTL_TEST_MODE); } -static void meson_gxl_close_banks(struct phy_device *phydev) -{ - phy_write(phydev, TSTCNTL, 0); -} - static int meson_gxl_read_reg(struct phy_device *phydev, unsigned int bank, unsigned int reg) { int ret; - ret = meson_gxl_open_banks(phydev); - if (ret) - goto out; - ret = phy_write(phydev, TSTCNTL, TSTCNTL_READ | FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) | TSTCNTL_TEST_MODE | FIELD_PREP(TSTCNTL_READ_ADDRESS, reg)); if (ret) - goto out; + return ret; - ret = phy_read(phydev, TSTREAD1); -out: - /* Close the bank access on our way out */ - meson_gxl_close_banks(phydev); - return ret; + return phy_read(phydev, TSTREAD1); } static int meson_gxl_write_reg(struct phy_device *phydev, @@ -102,29 +89,28 @@ static int meson_gxl_write_reg(struct phy_device *phydev, { int ret; - ret = meson_gxl_open_banks(phydev); - if (ret) - goto out; - ret = phy_write(phydev, TSTWRITE, value); if (ret) - goto out; + return ret; - ret = phy_write(phydev, TSTCNTL, TSTCNTL_WRITE | - FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) | - TSTCNTL_TEST_MODE | - FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg)); + return phy_write(phydev, TSTCNTL, TSTCNTL_WRITE | + FIELD_PREP(TSTCNTL_REG_BANK_SEL, bank) | + TSTCNTL_TEST_MODE | + FIELD_PREP(TSTCNTL_WRITE_ADDRESS, reg)); -out: - /* Close the bank access on our way out */ - meson_gxl_close_banks(phydev); - return ret; } static int meson_gxl_config_init(struct phy_device *phydev) { int ret; + phy_set_bits(phydev, 0x1b, BIT(12)); + phy_write(phydev, 0x11, 0x0080); + + meson_gxl_open_banks(phydev); + + ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x8e0d); + /* Enable fractional PLL */ ret = meson_gxl_write_reg(phydev, BANK_BIST, FR_PLL_CONTROL, 0x5); if (ret) @@ -140,6 +126,10 @@ static int meson_gxl_config_init(struct phy_device *phydev) if (ret) return ret; + ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x18, 0x000c); + ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x17, 0x1a0c); + ret = meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x1a, 0x6400); + return 0; } @@ -186,7 +176,7 @@ static int meson_gxl_read_status(struct phy_device *phydev) if (!(wol & LPI_STATUS_RSV12) || ((exp & EXPANSION_NWAY) && !(lpa & LPA_LPACK))) { /* Looks like aneg failed after all */ - phydev_dbg(phydev, "LPA corruption - aneg restart\n"); + phydev_warn(phydev, "LPA corruption - aneg restart\n"); return genphy_restart_aneg(phydev); } } @@ -243,11 +233,23 @@ static irqreturn_t meson_gxl_handle_interrupt(struct phy_device *phydev) irq_status == INTSRC_ENERGY_DETECT) return IRQ_HANDLED; - phy_trigger_machine(phydev); + /* Give PHY some time before MAC starts sending data. This works + * around an issue where network doesn't come up properly. + */ + if (irq_status & INTSRC_ANEG_COMPLETE) + phy_queue_state_machine(phydev, msecs_to_jiffies(100)); + else + phy_trigger_machine(phydev); return IRQ_HANDLED; } +static void meson_gxl_link_change_notify(struct phy_device *phydev) +{ + if (phydev->state == PHY_RUNNING && phydev->speed == SPEED_100) + meson_gxl_write_reg(phydev, BANK_ANALOG_DSP, 0x14, 0xa900); +} + static struct phy_driver meson_gxl_phy[] = { { PHY_ID_MATCH_EXACT(0x01814400), @@ -259,6 +261,7 @@ static struct phy_driver meson_gxl_phy[] = { .read_status = meson_gxl_read_status, .config_intr = meson_gxl_config_intr, .handle_interrupt = meson_gxl_handle_interrupt, + .link_change_notify = meson_gxl_link_change_notify, .suspend = genphy_suspend, .resume = genphy_resume, }, { -- 2.35.1
Powered by blists - more mailing lists