[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cba540dc-66c9-8c3a-dc46-d526181dbe8b@pp.inet.fi>
Date: Mon, 20 Apr 2020 22:56:46 +0300
From: Lauri Jakku <lauri.jakku@...inet.fi>
To: Heiner Kallweit <hkallweit1@...il.com>
Cc: Leon Romanovsky <leon@...nel.org>, netdev@...r.kernel.org,
nic_swsd@...ltek.com
Subject: Re: NET: r8168/r8169 identifying fix
Hi,
On 20.4.2020 0.00, Lauri Jakku wrote:
>
> On 19.4.2020 19.00, Heiner Kallweit wrote:
>> On 19.04.2020 18:49, Lauri Jakku wrote:
>>> Hi,
>>>
>>> On 19.4.2020 18.09, Lauri Jakku wrote:
>>>> Hi,
>>>>
>>>> On 18.4.2020 21.46, Lauri Jakku wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On 18.4.2020 14.06, Lauri Jakku wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 17.4.2020 10.30, Lauri Jakku wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 17.4.2020 9.23, Lauri Jakku wrote:
>>>>>>>> On 16.4.2020 23.50, Heiner Kallweit wrote:
>>>>>>>>> On 16.04.2020 22:38, Lauri Jakku wrote:
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>> On 16.4.2020 23.10, Lauri Jakku wrote:
>>>>>>>>>>> On 16.4.2020 23.02, Heiner Kallweit wrote:
>>>>>>>>>>>> On 16.04.2020 21:58, Lauri Jakku wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 16.4.2020 21.37, Lauri Jakku wrote:
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 16.4.2020 21.26, Heiner Kallweit wrote:
>>>>>>>>>>>>>>> On 16.04.2020 13:30, Lauri Jakku wrote:
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 5.6.3-2-MANJARO: stock manjaro kernel, without
>>>>>>>>>>>>>>>> modifications --> network does not work
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 5.6.3-2-MANJARO-lja: No attach check, modified kernel
>>>>>>>>>>>>>>>> (r8169 mods only) --> network does not work
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 5.6.3-2-MANJARO-with-the-r8169-patch: phy patched +
>>>>>>>>>>>>>>>> r8169 mods -> devices show up ok, network works
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> All different initcpio's have realtek.ko in them.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the logs. Based on the logs you're presumable
>>>>>>>>>>>>>>> affected by a known BIOS bug.
>>>>>>>>>>>>>>> Check bug tickets 202275 and 207203 at bugzilla.kernel.org.
>>>>>>>>>>>>>>> In the first referenced tickets it's about the same
>>>>>>>>>>>>>>> mainboard (with earlier BIOS version).
>>>>>>>>>>>>>>> BIOS on this mainboard seems to not initialize the
>>>>>>>>>>>>>>> network chip / PHY correctly, it reports
>>>>>>>>>>>>>>> a random number as PHY ID, resulting in no PHY driver
>>>>>>>>>>>>>>> being found.
>>>>>>>>>>>>>>> Enable "Onboard LAN Boot ROM" in the BIOS, and your
>>>>>>>>>>>>>>> problem should be gone.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> OK, I try that, thank you :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>> It seems that i DO have the ROM's enabled, i'm now testing
>>>>>>>>>>>>> some mutex guard for phy state and try to use it as indicator
>>>>>>>>>>>>>
>>>>>>>>>>>>> that attach has been done. One thing i've noticed is that
>>>>>>>>>>>>> driver needs to be reloaded to allow traffic (ie. ping
>>>>>>>>>>>>> works etc.)
>>>>>>>>>>>>>
>>>>>>>>>>>> All that shouldn't be needed. Just check with which PHY ID
>>>>>>>>>>>> the PHY comes up.
>>>>>>>>>>>> And what do you mean with "it seems"? Is the option enabled
>>>>>>>>>>>> or not?
>>>>>>>>>>>>
>>>>>>>>>>> I do have ROM's enabled, and it does not help with my issue.
>>>>>>>>> Your BIOS is a beta version, downgrading to F7 may help. Then
>>>>>>>>> you have the same
>>>>>>>>> mainboard + BIOS as the user who opened bug ticket 202275.
>>>>>>>>>
>>>>>>>> huhti 17 09:01:49 MinistryOfSillyWalk kernel: r8169
>>>>>>>> 0000:03:00.0: PHY version: 0xc2077002
>>>>>>>> huhti 17 09:01:49 MinistryOfSillyWalk kernel: r8169
>>>>>>>> 0000:03:00.0: MAC version: 23
>>>>>>>>
>>>>>>>> ....
>>>>>>>>
>>>>>>>> huhti 17 09:03:29 MinistryOfSillyWalk kernel: r8169
>>>>>>>> 0000:02:00.0: PHY version: 0x1cc912
>>>>>>>>
>>>>>>>> huhti 17 09:03:29 MinistryOfSillyWalk kernel: r8169
>>>>>>>> 0000:02:00.0: MAC version: 23
>>>>>>>>
>>>>>>>> .. after module unload & load cycle:
>>>>>>>>
>>>>>>>> huhti 17 09:17:35 MinistryOfSillyWalk kernel: r8169
>>>>>>>> 0000:02:00.0: PHY version: 0x1cc912
>>>>>>>> huhti 17 09:17:35 MinistryOfSillyWalk kernel: r8169
>>>>>>>> 0000:02:00.0: MAC version: 23
>>>>>>>>
>>>>>>>>
>>>>>>>> it seem to be the case that the phy_id chances onetime, then
>>>>>>>> stays the same. I'll do few shutdowns and see
>>>>>>>>
>>>>>>>> is there a pattern at all .. next i'm going to try how it
>>>>>>>> behaves, if i read mac/phy versions twice on MAC version 23.
>>>>>>>>
>>>>>>>>
>>>>>>>> The BIOS downgrade: I'd like to solve this without downgrading
>>>>>>>> BIOS. If I can't, then I'll do systemd-service that
>>>>>>>>
>>>>>>>> reloads r8169 driver at boot, cause then network is just fine.
>>>>>>>>
>>>>>>>>
>>>>>>> What i've gathered samples now, there is three values for PHY ID:
>>>>>>>
>>>>>>> [sillyme@...istryOfSillyWalk KernelStuff]$ sudo journalctl |grep
>>>>>>> "PHY ver"
>>>>>>> huhti 17 09:01:49 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:02:00.0: PHY version: 0xc2077002
>>>>>>> huhti 17 09:01:49 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:03:00.0: PHY version: 0xc2077002
>>>>>>> huhti 17 09:03:29 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:02:00.0: PHY version: 0x1cc912
>>>>>>> huhti 17 09:03:29 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:03:00.0: PHY version: 0x1cc912
>>>>>>> huhti 17 09:17:35 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:02:00.0: PHY version: 0x1cc912
>>>>>>> huhti 17 09:17:35 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:03:00.0: PHY version: 0x1cc912
>>>>>>> huhti 17 09:24:53 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:02:00.0: PHY version: 0xc1071002
>>>>>>> huhti 17 09:24:53 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:03:00.0: PHY version: 0xc1071002
>>>>>>> huhti 17 09:27:59 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:02:00.0: PHY version: 0x1cc912
>>>>>>> huhti 17 09:27:59 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:03:00.0: PHY version: 0x1cc912
>>>>>>> huhti 17 10:08:42 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:02:00.0: PHY version: 0xc1071002
>>>>>>> huhti 17 10:08:42 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:03:00.0: PHY version: 0xc1071002
>>>>>>> huhti 17 10:12:07 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:02:00.0: PHY version: 0x1cc912
>>>>>>> huhti 17 10:12:07 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:03:00.0: PHY version: 0x1cc912
>>>>>>> huhti 17 10:20:35 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:02:00.0: PHY version: 0xc1071002
>>>>>>> huhti 17 10:20:35 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:03:00.0: PHY version: 0xc1071002
>>>>>>> huhti 17 10:23:46 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:02:00.0: PHY version: 0x1cc912
>>>>>>> huhti 17 10:23:46 MinistryOfSillyWalk kernel: r8169
>>>>>>> 0000:03:00.0: PHY version: 0x1cc912
>>>>>>>
>>>>>>> I dont know are those hard coded or what, and are they device
>>>>>>> specific how much.
>>>>>>>
>>>>>>> i haven't coldbooted things up, that may be that something to
>>>>>>> check do they vary how per coldboot.
>>>>>>>
>>>>>>>>>> I check the ID, and revert all other changes, and check how
>>>>>>>>>> it is working after adding the PHY id to list.
>>>>>>>>>>
>>>>>> What i've now learned: the patch + script + journald services ->
>>>>>> Results working network, but it is still a workaround.
>>>>>>
>>>>> Following patch trusts the MAC version, another thing witch could
>>>>> help is to derive method to do 2dn pass of the probeing:
>>>>>
>>>>> if specific MAC version is found.
>>>>>
>>>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c
>>>>> b/drivers/net/ethernet/realtek/r8169_main.c
>>>>> index acd122a88d4a..62b37a1abc24 100644
>>>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>>>> @@ -5172,13 +5172,18 @@ static int r8169_mdio_register(struct
>>>>> rtl8169_private *tp)
>>>>> if (!tp->phydev) {
>>>>> mdiobus_unregister(new_bus);
>>>>> return -ENODEV;
>>>>> - } else if (tp->mac_version == RTL_GIGA_MAC_NONE) {
>>>>> - /* Most chip versions fail with the genphy driver.
>>>>> - * Therefore ensure that the dedicated PHY driver
>>>>> is loaded.
>>>>> - */
>>>>> - dev_err(&pdev->dev, "Not known MAC version.\n");
>>>>> - mdiobus_unregister(new_bus);
>>>>> - return -EUNATCH;
>>>>> + } else {
>>>>> + dev_info(&pdev->dev, "PHY version: 0x%x\n",
>>>>> tp->phydev->phy_id);
>>>>> + dev_info(&pdev->dev, "MAC version: %d\n",
>>>>> tp->mac_version);
>>>>> +
>>>>> + if (tp->mac_version == RTL_GIGA_MAC_NONE) {
>>>>> + /* Most chip versions fail with the genphy
>>>>> driver.
>>>>> + * Therefore ensure that the dedicated PHY
>>>>> driver is loaded.
>>>>> + */
>>>>> + dev_err(&pdev->dev, "Not known MAC/PHY
>>>>> version.\n", tp->phydev->phy_id);
>>>>> + mdiobus_unregister(new_bus);
>>>>> + return -EUNATCH;
>>>>> + }
>>>>> }
>>>>>
>>>>> /* PHY will be woken up in rtl_open() */
>>>>>
>>>> I just got bleeding edge 5.7.0-1 kernel compiled + firmware's
>>>> updated.. and now up'n'running. There is one (WARN_ONCE) stack
>>>> trace coming from driver, i think i tinker with it next, with above
>>>> patch the network devices shows up and they can be configured.
>>>>
>>> I tought to ask first, before going to make new probe_type for
>>> errorneus hw (propetype + retry counter) to do re-probe if
>>> requested, N times. Or should the r8169_main.c return deferred probe
>>> on error on some MAC enums ? Which approach is design-wise sound ?
>>>
>>> I just tought that the DEFERRED probe may just do the trick i'm
>>> looking ways to implement the re-probeing... hmm. I try the deferred
>>> thing and check would that help.
>>>
>> Playing with options to work around the issue is of course a great
>> way to
>> learn about the kernel. However it's questionable whether a
>> workaround in
>> the driver is acceptable for dealing with the broken BIOS of exactly one
>
> Hmm, I think that it is better to have devices supported, even if the
> HW is old, and cause it has had support.
>
> There are even older harddisk & network drivers, so the argument is
> not good, thus i see no reason why not
>
> to take patch in.
>
>
>>> 10 yrs old board (for which even a userspace workaround exists).
> The age of driver has never been the thing .. there are very old
> drivers, HDD and Network etc.
>>>>>>>>>>>>>>>> The problem with old method seems to be, that device
>>>>>>>>>>>>>>>> does not have had time to attach before the
I think i have now good combination of driver, that does the old trick
on real non realtek NICs + userspace added configuration. See at Manjaro
forum https://forum.manjaro.org/t/r8168-kernel-5-6-3-driver-broken/135325/10
for them for meantime. I think the 'refresh' within kernel, is doable,
but there is quite much i don't still know how kernel works, so my
solution might be bit more hacky and slashy,
i try to do some elegant ish solution, but the kernel patch + those
systemd files are for those by-all-means-hack-fixers.
>>>>>>>>>>>>>>>> PHY driver check.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The patch:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c
>>>>>>>>>>>>>>>> b/drivers/net/ethernet/realtek/r8169_main.c
>>>>>>>>>>>>>>>> index bf5bf05970a2..acd122a88d4a 100644
>>>>>>>>>>>>>>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>>>>>>>>>>>>>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>>>>>>>>>>>>>>> @@ -5172,11 +5172,11 @@ static int
>>>>>>>>>>>>>>>> r8169_mdio_register(struct rtl8169_private *tp)
>>>>>>>>>>>>>>>> if (!tp->phydev) {
>>>>>>>>>>>>>>>> mdiobus_unregister(new_bus);
>>>>>>>>>>>>>>>> return -ENODEV;
>>>>>>>>>>>>>>>> - } else if (!tp->phydev->drv) {
>>>>>>>>>>>>>>>> + } else if (tp->mac_version == RTL_GIGA_MAC_NONE) {
>>>>>>>>>>>>>>>> /* Most chip versions fail with the
>>>>>>>>>>>>>>>> genphy driver.
>>>>>>>>>>>>>>>> * Therefore ensure that the
>>>>>>>>>>>>>>>> dedicated PHY driver is loaded.
>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>> - dev_err(&pdev->dev, "realtek.ko not loaded, maybe it
>>>>>>>>>>>>>>>> needs to be added to initramfs?\n");
>>>>>>>>>>>>>>>> + dev_err(&pdev->dev, "Not known MAC version.\n");
>>>>>>>>>>>>>>>> mdiobus_unregister(new_bus);
>>>>>>>>>>>>>>>> return -EUNATCH;
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> diff --git a/drivers/net/phy/phy-core.c
>>>>>>>>>>>>>>>> b/drivers/net/phy/phy-core.c
>>>>>>>>>>>>>>>> index 66b8c61ca74c..aba2b304b821 100644
>>>>>>>>>>>>>>>> --- a/drivers/net/phy/phy-core.c
>>>>>>>>>>>>>>>> +++ b/drivers/net/phy/phy-core.c
>>>>>>>>>>>>>>>> @@ -704,6 +704,10 @@ EXPORT_SYMBOL_GPL(phy_modify_mmd);
>>>>>>>>>>>>>>>> static int __phy_read_page(struct phy_device
>>>>>>>>>>>>>>>> *phydev)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>> + /* If not attached, do nothing (no warning) */
>>>>>>>>>>>>>>>> + if (!phydev->attached_dev)
>>>>>>>>>>>>>>>> + return -EOPNOTSUPP;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> if (WARN_ONCE(!phydev->drv->read_page,
>>>>>>>>>>>>>>>> "read_page callback not available, PHY driver not
>>>>>>>>>>>>>>>> loaded?\n"))
>>>>>>>>>>>>>>>> return -EOPNOTSUPP;
>>>>>>>>>>>>>>>> @@ -712,12 +716,17 @@ static int
>>>>>>>>>>>>>>>> __phy_read_page(struct phy_device *phydev)
>>>>>>>>>>>>>>>> static int __phy_write_page(struct phy_device
>>>>>>>>>>>>>>>> *phydev, int page)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>> + /* If not attached, do nothing (no warning) */
>>>>>>>>>>>>>>>> + if (!phydev->attached_dev)
>>>>>>>>>>>>>>>> + return -EOPNOTSUPP;
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> if (WARN_ONCE(!phydev->drv->write_page,
>>>>>>>>>>>>>>>> "write_page callback not available, PHY driver not
>>>>>>>>>>>>>>>> loaded?\n"))
>>>>>>>>>>>>>>>> return -EOPNOTSUPP;
>>>>>>>>>>>>>>>> return phydev->drv->write_page(phydev,
>>>>>>>>>>>>>>>> page);
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>> /**
>>>>>>>>>>>>>>>> * phy_save_page() - take the bus lock and save
>>>>>>>>>>>>>>>> the current page
>>>>>>>>>>>>>>>> * @phydev: a pointer to a &struct phy_device
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 15. huhtik. 2020, 19.18, Heiner Kallweit
>>>>>>>>>>>>>>>> <hkallweit1@...il.com <mailto:hkallweit1@...il.com>>
>>>>>>>>>>>>>>>> kirjoitti:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 15.04.2020 16:39, Lauri Jakku wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi, There seems to he Something odd
>>>>>>>>>>>>>>>> problem, maybe timing related. Stripped version not
>>>>>>>>>>>>>>>> workingas expected. I get back to you, when i have it
>>>>>>>>>>>>>>>> working.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There's no point in working on your patch. W/o
>>>>>>>>>>>>>>>> proper justification it
>>>>>>>>>>>>>>>> isn't acceptable anyway. And so far we still
>>>>>>>>>>>>>>>> don't know which problem
>>>>>>>>>>>>>>>> you actually have.
>>>>>>>>>>>>>>>> FIRST please provide the requested logs and
>>>>>>>>>>>>>>>> explain the actual problem
>>>>>>>>>>>>>>>> (incl. the commit that caused the regression).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 13. huhtik. 2020, 14.46, Lauri Jakku
>>>>>>>>>>>>>>>> <ljakku77@...il.com <mailto:ljakku77@...il.com>>
>>>>>>>>>>>>>>>> kirjoitti: Hi, Fair enough, i'll strip them. -lja On
>>>>>>>>>>>>>>>> 2020-04-13 14:34, Leon Romanovsky wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Apr 13, 2020 at 02:02:01PM +0300,
>>>>>>>>>>>>>>>> Lauri Jakku wrote: Hi, Comments inline. On 2020-04-13
>>>>>>>>>>>>>>>> 13:58, Leon Romanovsky wrote: On Mon, Apr 13, 2020 at
>>>>>>>>>>>>>>>> 01:30:13PM +0300, Lauri Jakku wrote: From
>>>>>>>>>>>>>>>> 2d41edd4e6455187094f3a13d58c46eeee35aa31 Mon Sep 17
>>>>>>>>>>>>>>>> 00:00:00 2001 From: Lauri Jakku <lja@....fi> Date: Mon,
>>>>>>>>>>>>>>>> 13 Apr 2020 13:18:35 +0300 Subject: [PATCH] NET:
>>>>>>>>>>>>>>>> r8168/r8169 identifying fix The driver installation
>>>>>>>>>>>>>>>> determination made properly by checking PHY vs DRIVER
>>>>>>>>>>>>>>>> id's. --- drivers/net/ethernet/realtek/r8169_main.c |
>>>>>>>>>>>>>>>> 70 ++++++++++++++++++++--- drivers/net/phy/mdio_bus.c |
>>>>>>>>>>>>>>>> 11 +++- 2 files changed, 72 insertions(+), 9
>>>>>>>>>>>>>>>> deletions(-) I would say that most of the code is debug
>>>>>>>>>>>>>>>> prints. I tought that they are helpful to keep, they
>>>>>>>>>>>>>>>> are using the debug calls, so they are not visible if
>>>>>>>>>>>>>>>> user does not like those. You are missing the point of
>>>>>>>>>>>>>>>> who are your users. Users want to have working device
>>>>>>>>>>>>>>>> and the code. They don't need or like to debug their
>>>>>>>>>>>>>>>> kernel. Thanks
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
Powered by blists - more mailing lists