lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 1 May 2020 22:12:07 +0300
From:   Lauri Jakku <lauri.jakku@...inet.fi>
To:     Heiner Kallweit <hkallweit1@...il.com>
Cc:     Leon Romanovsky <leon@...nel.org>, netdev@...r.kernel.org,
        nic_swsd@...ltek.com
Subject: Re: NET: r8168/r8169 identifying fix

Hi,


On 19.4.2020 19.00, Heiner Kallweit wrote:
> On 19.04.2020 18:49, Lauri Jakku wrote:
>> Hi,
>>
>> On 19.4.2020 18.09, Lauri Jakku wrote:
>>> Hi,
>>>
>>> On 18.4.2020 21.46, Lauri Jakku wrote:
>>>
>>>> Hi,
>>>>
>>>> On 18.4.2020 14.06, Lauri Jakku wrote:
>>>>> Hi,
>>>>>
>>>>> On 17.4.2020 10.30, Lauri Jakku wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 17.4.2020 9.23, Lauri Jakku wrote:
>>>>>>> On 16.4.2020 23.50, Heiner Kallweit wrote:
>>>>>>>> On 16.04.2020 22:38, Lauri Jakku wrote:
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> On 16.4.2020 23.10, Lauri Jakku wrote:
>>>>>>>>>> On 16.4.2020 23.02, Heiner Kallweit wrote:
>>>>>>>>>>> On 16.04.2020 21:58, Lauri Jakku wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> On 16.4.2020 21.37, Lauri Jakku wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 16.4.2020 21.26, Heiner Kallweit wrote:
>>>>>>>>>>>>>> On 16.04.2020 13:30, Lauri Jakku wrote:
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 5.6.3-2-MANJARO: stock manjaro kernel, without modifications --> network does not work
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 5.6.3-2-MANJARO-lja: No attach check, modified kernel (r8169 mods only) --> network does not work
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 5.6.3-2-MANJARO-with-the-r8169-patch: phy patched + r8169 mods -> devices show up ok, network works
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> All different initcpio's have realtek.ko in them.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the logs. Based on the logs you're presumable affected by a known BIOS bug.
>>>>>>>>>>>>>> Check bug tickets 202275 and 207203 at bugzilla.kernel.org.
>>>>>>>>>>>>>> In the first referenced tickets it's about the same mainboard (with earlier BIOS version).
>>>>>>>>>>>>>> BIOS on this mainboard seems to not initialize the network chip / PHY correctly, it reports
>>>>>>>>>>>>>> a random number as PHY ID, resulting in no PHY driver being found.
>>>>>>>>>>>>>> Enable "Onboard LAN Boot ROM" in the BIOS, and your problem should be gone.
>>>>>>>>>>>>>>
>>>>>>>>>>>>> OK, I try that, thank you :)
>>>>>>>>>>>>>
>>>>>>>>>>>> It seems that i DO have the ROM's enabled, i'm now testing some mutex guard for phy state and try to use it as indicator
>>>>>>>>>>>>
>>>>>>>>>>>> that attach has been done. One thing i've noticed is that driver needs to be reloaded to allow traffic (ie. ping works etc.)
>>>>>>>>>>>>
>>>>>>>>>>> All that shouldn't be needed. Just check with which PHY ID the PHY comes up.
>>>>>>>>>>> And what do you mean with "it seems"? Is the option enabled or not?
>>>>>>>>>>>
>>>>>>>>>> I do have ROM's enabled, and it does not help with my issue.
>>>>>>>> Your BIOS is a beta version, downgrading to F7 may help. Then you have the same
>>>>>>>> mainboard + BIOS as the user who opened bug ticket 202275.
>>>>>>>>
>>>>>>> huhti 17 09:01:49 MinistryOfSillyWalk kernel: r8169 0000:03:00.0: PHY version: 0xc2077002
>>>>>>> huhti 17 09:01:49 MinistryOfSillyWalk kernel: r8169 0000:03:00.0: MAC version: 23
>>>>>>>
>>>>>>> ....
>>>>>>>
>>>>>>> huhti 17 09:03:29 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: PHY version: 0x1cc912
>>>>>>>
>>>>>>> huhti 17 09:03:29 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: MAC version: 23
>>>>>>>
>>>>>>> .. after module unload & load cycle:
>>>>>>>
>>>>>>> huhti 17 09:17:35 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: PHY version: 0x1cc912
>>>>>>> huhti 17 09:17:35 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: MAC version: 23
>>>>>>>
>>>>>>>
>>>>>>> it seem to be the case that the phy_id chances onetime, then stays the same. I'll do few shutdowns and see
>>>>>>>
>>>>>>> is there a pattern at all .. next i'm going to try how it behaves, if i read mac/phy versions twice on MAC version 23.
>>>>>>>
>>>>>>>
>>>>>>> The BIOS downgrade: I'd like to solve this without downgrading BIOS. If I can't, then I'll do systemd-service that
>>>>>>>
>>>>>>> reloads r8169 driver at boot, cause then network is just fine.
>>>>>>>
>>>>>>>
>>>>>> What i've gathered samples now, there is three values for PHY ID:
>>>>>>
>>>>>> [sillyme@...istryOfSillyWalk KernelStuff]$ sudo journalctl |grep "PHY ver"
>>>>>> huhti 17 09:01:49 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: PHY version: 0xc2077002
>>>>>> huhti 17 09:01:49 MinistryOfSillyWalk kernel: r8169 0000:03:00.0: PHY version: 0xc2077002
>>>>>> huhti 17 09:03:29 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: PHY version: 0x1cc912
>>>>>> huhti 17 09:03:29 MinistryOfSillyWalk kernel: r8169 0000:03:00.0: PHY version: 0x1cc912
>>>>>> huhti 17 09:17:35 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: PHY version: 0x1cc912
>>>>>> huhti 17 09:17:35 MinistryOfSillyWalk kernel: r8169 0000:03:00.0: PHY version: 0x1cc912
>>>>>> huhti 17 09:24:53 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: PHY version: 0xc1071002
>>>>>> huhti 17 09:24:53 MinistryOfSillyWalk kernel: r8169 0000:03:00.0: PHY version: 0xc1071002
>>>>>> huhti 17 09:27:59 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: PHY version: 0x1cc912
>>>>>> huhti 17 09:27:59 MinistryOfSillyWalk kernel: r8169 0000:03:00.0: PHY version: 0x1cc912
>>>>>> huhti 17 10:08:42 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: PHY version: 0xc1071002
>>>>>> huhti 17 10:08:42 MinistryOfSillyWalk kernel: r8169 0000:03:00.0: PHY version: 0xc1071002
>>>>>> huhti 17 10:12:07 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: PHY version: 0x1cc912
>>>>>> huhti 17 10:12:07 MinistryOfSillyWalk kernel: r8169 0000:03:00.0: PHY version: 0x1cc912
>>>>>> huhti 17 10:20:35 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: PHY version: 0xc1071002
>>>>>> huhti 17 10:20:35 MinistryOfSillyWalk kernel: r8169 0000:03:00.0: PHY version: 0xc1071002
>>>>>> huhti 17 10:23:46 MinistryOfSillyWalk kernel: r8169 0000:02:00.0: PHY version: 0x1cc912
>>>>>> huhti 17 10:23:46 MinistryOfSillyWalk kernel: r8169 0000:03:00.0: PHY version: 0x1cc912
>>>>>>
>>>>>> I dont know are those hard coded or what, and are they device specific how much.
>>>>>>
>>>>>> i haven't coldbooted things up, that may be that something to check do they vary how per coldboot.
>>>>>>
>>>>>>>>> I check the ID, and revert all other changes, and check how it is working after adding the PHY id to list.
>>>>>>>>>
>>>>> What i've now learned: the patch + script + journald services -> Results working network, but it is still a workaround.
>>>>>
>>>> Following patch trusts the MAC version, another thing witch could help is to derive method to do 2dn pass of the probeing:
>>>>
>>>> if specific MAC version is found.
>>>>
>>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>>>> index acd122a88d4a..62b37a1abc24 100644
>>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>>> @@ -5172,13 +5172,18 @@ static int r8169_mdio_register(struct rtl8169_private *tp)
>>>>          if (!tp->phydev) {
>>>>                  mdiobus_unregister(new_bus);
>>>>                  return -ENODEV;
>>>> -       } else if (tp->mac_version == RTL_GIGA_MAC_NONE) {
>>>> -               /* Most chip versions fail with the genphy driver.
>>>> -                * Therefore ensure that the dedicated PHY driver is loaded.
>>>> -                */
>>>> -               dev_err(&pdev->dev, "Not known MAC version.\n");
>>>> -               mdiobus_unregister(new_bus);
>>>> -               return -EUNATCH;
>>>> +       } else {
>>>> +               dev_info(&pdev->dev, "PHY version: 0x%x\n", tp->phydev->phy_id);
>>>> +               dev_info(&pdev->dev, "MAC version: %d\n", tp->mac_version);
>>>> +
>>>> +               if (tp->mac_version == RTL_GIGA_MAC_NONE) {
>>>> +                       /* Most chip versions fail with the genphy driver.
>>>> +                        * Therefore ensure that the dedicated PHY driver is loaded.
>>>> +                        */
>>>> +                       dev_err(&pdev->dev, "Not known MAC/PHY version.\n", tp->phydev->phy_id);
>>>> +                       mdiobus_unregister(new_bus);
>>>> +                       return -EUNATCH;
>>>> +               }
>>>>          }
>>>>
>>>>          /* PHY will be woken up in rtl_open() */
>>>>
>>> I just got bleeding edge 5.7.0-1 kernel compiled + firmware's updated.. and  now up'n'running. There is one (WARN_ONCE) stack trace coming from driver, i think i tinker with it next, with above patch the network devices shows up and they can be configured.
>>>
>> I tought to ask first, before going to make new probe_type for errorneus hw (propetype + retry counter) to do re-probe if requested, N times. Or should the r8169_main.c return deferred probe on error on some MAC enums ? Which approach is design-wise sound ?
>>
>> I just tought that the DEFERRED probe may just do the trick i'm looking ways to implement the re-probeing... hmm. I try the deferred thing and check would that help.
>>
> Playing with options to work around the issue is of course a great way to
> learn about the kernel. However it's questionable whether a workaround in
> the driver is acceptable for dealing with the broken BIOS of exactly one
>> 10 yrs old board (for which even a userspace workaround exists).

problem recognized: libphy-module get's unloaded for some reason before 
r8169 driver loads -> missing lowlevel functionality -> not working 
driver. This only occurs at 1st load of module.. seeking solution.


There is [last unloaded: libphy] entries in log BEFORE r8169 is probed 
first time.


Any clue what is responsible for unloading to occur ?


>>>>>>>>>>>>>>> The problem with old method seems to be, that device does not have had time to attach before the
>>>>>>>>>>>>>>> PHY driver check.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The patch:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
>>>>>>>>>>>>>>> index bf5bf05970a2..acd122a88d4a 100644
>>>>>>>>>>>>>>> --- a/drivers/net/ethernet/realtek/r8169_main.c
>>>>>>>>>>>>>>> +++ b/drivers/net/ethernet/realtek/r8169_main.c
>>>>>>>>>>>>>>> @@ -5172,11 +5172,11 @@ static int r8169_mdio_register(struct rtl8169_private *tp)
>>>>>>>>>>>>>>>             if (!tp->phydev) {
>>>>>>>>>>>>>>> mdiobus_unregister(new_bus);
>>>>>>>>>>>>>>>                     return -ENODEV;
>>>>>>>>>>>>>>> -       } else if (!tp->phydev->drv) {
>>>>>>>>>>>>>>> +       } else if (tp->mac_version == RTL_GIGA_MAC_NONE) {
>>>>>>>>>>>>>>>                     /* Most chip versions fail with the genphy driver.
>>>>>>>>>>>>>>>                      * Therefore ensure that the dedicated PHY driver is loaded.
>>>>>>>>>>>>>>>                      */
>>>>>>>>>>>>>>> -               dev_err(&pdev->dev, "realtek.ko not loaded, maybe it needs to be added to initramfs?\n");
>>>>>>>>>>>>>>> +               dev_err(&pdev->dev, "Not known MAC version.\n");
>>>>>>>>>>>>>>> mdiobus_unregister(new_bus);
>>>>>>>>>>>>>>>                     return -EUNATCH;
>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>> diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
>>>>>>>>>>>>>>> index 66b8c61ca74c..aba2b304b821 100644
>>>>>>>>>>>>>>> --- a/drivers/net/phy/phy-core.c
>>>>>>>>>>>>>>> +++ b/drivers/net/phy/phy-core.c
>>>>>>>>>>>>>>> @@ -704,6 +704,10 @@ EXPORT_SYMBOL_GPL(phy_modify_mmd);
>>>>>>>>>>>>>>>        static int __phy_read_page(struct phy_device *phydev)
>>>>>>>>>>>>>>>      {
>>>>>>>>>>>>>>> +       /* If not attached, do nothing (no warning) */
>>>>>>>>>>>>>>> +       if (!phydev->attached_dev)
>>>>>>>>>>>>>>> +               return -EOPNOTSUPP;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>             if (WARN_ONCE(!phydev->drv->read_page, "read_page callback not available, PHY driver not loaded?\n"))
>>>>>>>>>>>>>>>                     return -EOPNOTSUPP;
>>>>>>>>>>>>>>>      @@ -712,12 +716,17 @@ static int __phy_read_page(struct phy_device *phydev)
>>>>>>>>>>>>>>>        static int __phy_write_page(struct phy_device *phydev, int page)
>>>>>>>>>>>>>>>      {
>>>>>>>>>>>>>>> +       /* If not attached, do nothing (no warning) */
>>>>>>>>>>>>>>> +       if (!phydev->attached_dev)
>>>>>>>>>>>>>>> +               return -EOPNOTSUPP;
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>             if (WARN_ONCE(!phydev->drv->write_page, "write_page callback not available, PHY driver not loaded?\n"))
>>>>>>>>>>>>>>>                     return -EOPNOTSUPP;
>>>>>>>>>>>>>>>               return phydev->drv->write_page(phydev, page);
>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>      +
>>>>>>>>>>>>>>>      /**
>>>>>>>>>>>>>>>       * phy_save_page() - take the bus lock and save the current page
>>>>>>>>>>>>>>>       * @phydev: a pointer to a &struct phy_device
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 15. huhtik. 2020, 19.18, Heiner Kallweit <hkallweit1@...il.com <mailto:hkallweit1@...il.com>> kirjoitti:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         On 15.04.2020 16:39, Lauri Jakku wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>             Hi, There seems to he Something odd problem, maybe timing related. Stripped version not workingas expected. I get back to you, when i have it working.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>         There's no point in working on your patch. W/o proper justification it
>>>>>>>>>>>>>>>         isn't acceptable anyway. And so far we still don't know which problem
>>>>>>>>>>>>>>>         you actually have.
>>>>>>>>>>>>>>>         FIRST please provide the requested logs and explain the actual problem
>>>>>>>>>>>>>>>         (incl. the commit that caused the regression).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>             13. huhtik. 2020, 14.46, Lauri Jakku <ljakku77@...il.com <mailto:ljakku77@...il.com>> kirjoitti: Hi, Fair enough, i'll strip them. -lja On 2020-04-13 14:34, Leon Romanovsky wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>             On Mon, Apr 13, 2020 at 02:02:01PM +0300, Lauri Jakku wrote: Hi, Comments inline. On 2020-04-13 13:58, Leon Romanovsky wrote: On Mon, Apr 13, 2020 at 01:30:13PM +0300, Lauri Jakku wrote: From 2d41edd4e6455187094f3a13d58c46eeee35aa31 Mon Sep 17 00:00:00 2001 From: Lauri Jakku <lja@....fi> Date: Mon, 13 Apr 2020 13:18:35 +0300 Subject: [PATCH] NET: r8168/r8169 identifying fix The driver installation determination made properly by checking PHY vs DRIVER id's. --- drivers/net/ethernet/realtek/r8169_main.c | 70 ++++++++++++++++++++--- drivers/net/phy/mdio_bus.c | 11 +++- 2 files changed, 72 insertions(+), 9 deletions(-) I would say that most of the code is debug prints. I tought that they are helpful to keep, they are using the debug calls, so they are not visible if user does not like those. You are missing the point of who are your users. Users want to have working device and the code. They don't need or like to debug their kernel. Thanks
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>

Powered by blists - more mailing lists