lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 29 Dec 2018 10:45:52 +0100
From:   Heiner Kallweit <hkallweit1@...il.com>
To:     Florian Fainelli <f.fainelli@...il.com>,
        Andrew Lunn <andrew@...n.ch>,
        David Miller <davem@...emloft.net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Norbert Jurkeit <norbert.jurkeit@....de>,
        Frank Crawford <frank@...wford.emu.id.au>
Subject: Re: [PATCH net] net: phy: replace preliminary fix for PHY driver
 sometimes not binding to the device

On 29.12.2018 03:48, Florian Fainelli wrote:
> Le 12/28/18 à 6:42 PM, Florian Fainelli a écrit :
>> Le 12/24/18 à 3:21 AM, Heiner Kallweit a écrit :
>>> phy_device_create() uses request_module() to load the PHY driver module
>>> based on the PHY ID of the device. There is some timing issue which
>>> sometimes prevents the PHY driver to bind to the device. In such cases
>>> the genphy driver is used what can cause problems if genphy isn't
>>> compatible with the respective PHY.
>>> It turned out that the first fix can fix the issue in some but not all
>>> cases. Moving the call to device_initialize() before the call to
>>> request_module() was reported to fix the issue.
>>> I can't explain where the root cause of the issue is and why this fix
>>> works. AFAICS device_initialize() just initializes the device struct
>>> w/o doing anything that could interfere with e.g. bus_add_driver().
>>> This patch removes the first preliminary fix attempt.
>>
>> Humm but phy_device is comprised of a mdio_device on which the actual
>> matching is done, so you do have to call device_initialize() first in
>> order for the phy_device instance to have its companion mdio_device's
>> kobject to be properly initialized.
>>
>> Out of curiosity, do any of the people who tested that change have the
>> ability to run a kernel with list/kobject debugging enabled so we can
>> learn a bit more about the problematic code path?
> 
> Heiner; are you positive that the PHY is not in a power down mode
> (BMCR.PDOWN = 1) at the time the r8169 probe is done? Because if that
> was the case, there is no guarantee (per 802.3 clause 22 spec) that the
> PHY must correctly respond to MDIO operations other than read/write to
> BMCR register and this could explain that problem, as well as the
> general lack of connectivity for these users.
> 
Some users facing the issue reported that even when the dedicated PHY
driver isn't bound still module "realtek" shows up in lsmod.
This seems to prove that the correct PHYID was read from the chip.

It was also reported that building the PHY driver in instead of
making it a module avoids the issue. All that makes me think that
we have a nasty timing issue, not in phylib but in driver base core.

All that doesn't seem to be specific to the Realtek stuff, so I was
wondering why no other user reported such an issue before.
My assumption:
Different versions of Realtek network chips have been used on a lot of
(most?) consumer mainboards for more than 10 years. So there is a
huge base and wide variety of systems now running phylib, thus the
chance of triggering issues like the one we discuss now has massively
increased.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ