lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 25 May 2020 18:42:50 -0500
From:   Jeremy Linton <jeremy.linton@....com>
To:     Russell King - ARM Linux admin <linux@...linux.org.uk>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, andrew@...n.ch,
        f.fainelli@...il.com, hkallweit1@...il.com,
        madalin.bucur@....nxp.com, calvin.johnson@....nxp.com,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC 04/11] net: phy: Handle c22 regs presence better

Hi,

On 5/25/20 6:33 PM, Russell King - ARM Linux admin wrote:
> On Mon, May 25, 2020 at 06:22:19PM -0500, Jeremy Linton wrote:
>> On 5/25/20 6:09 PM, Russell King - ARM Linux admin wrote:
>>> On Mon, May 25, 2020 at 05:22:07PM -0500, Jeremy Linton wrote:
>>>> On 5/25/20 5:01 PM, Russell King - ARM Linux admin wrote:
>>>>> On Mon, May 25, 2020 at 04:51:16PM -0500, Jeremy Linton wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 5/25/20 5:06 AM, Russell King - ARM Linux admin wrote:
>>>>>>> On Sun, May 24, 2020 at 10:34:13PM -0500, Jeremy Linton wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 5/23/20 1:37 PM, Russell King - ARM Linux admin wrote:
>>>>>>>>> On Fri, May 22, 2020 at 04:30:52PM -0500, Jeremy Linton wrote:
>>>>>>>>>> Until this point, we have been sanitizing the c22
>>>>>>>>>> regs presence bit out of all the MMD device lists.
>>>>>>>>>> This is incorrect as it causes the 0xFFFFFFFF checks
>>>>>>>>>> to incorrectly fail. Further, it turns out that we
>>>>>>>>>> want to utilize this flag to make a determination that
>>>>>>>>>> there is actually a phy at this location and we should
>>>>>>>>>> be accessing it using c22.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Jeremy Linton <jeremy.linton@....com>
>>>>>>>>>> ---
>>>>>>>>>>       drivers/net/phy/phy_device.c | 16 +++++++++++++---
>>>>>>>>>>       1 file changed, 13 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>>>>>>>>> index f0761fa5e40b..2d677490ecab 100644
>>>>>>>>>> --- a/drivers/net/phy/phy_device.c
>>>>>>>>>> +++ b/drivers/net/phy/phy_device.c
>>>>>>>>>> @@ -689,9 +689,6 @@ static int get_phy_c45_devs_in_pkg(struct mii_bus *bus, int addr, int dev_addr,
>>>>>>>>>>       		return -EIO;
>>>>>>>>>>       	*devices_in_package |= phy_reg;
>>>>>>>>>> -	/* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>>>>>>>>>> -	*devices_in_package &= ~BIT(0);
>>>>>>>>>> -
>>>>>>>>>>       	return 0;
>>>>>>>>>>       }
>>>>>>>>>> @@ -742,6 +739,8 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>>>>>>       	int i;
>>>>>>>>>>       	const int num_ids = ARRAY_SIZE(c45_ids->device_ids);
>>>>>>>>>>       	u32 *devs = &c45_ids->devices_in_package;
>>>>>>>>>> +	bool c22_present = false;
>>>>>>>>>> +	bool valid_id = false;
>>>>>>>>>>       	/* Find first non-zero Devices In package. Device zero is reserved
>>>>>>>>>>       	 * for 802.3 c45 complied PHYs, so don't probe it at first.
>>>>>>>>>> @@ -770,6 +769,10 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>>>>>>       		return 0;
>>>>>>>>>>       	}
>>>>>>>>>> +	/* Bit 0 doesn't represent a device, it indicates c22 regs presence */
>>>>>>>>>> +	c22_present = *devs & BIT(0);
>>>>>>>>>> +	*devs &= ~BIT(0);
>>>>>>>>>> +
>>>>>>>>>>       	/* Now probe Device Identifiers for each device present. */
>>>>>>>>>>       	for (i = 1; i < num_ids; i++) {
>>>>>>>>>>       		if (!(c45_ids->devices_in_package & (1 << i)))
>>>>>>>>>> @@ -778,6 +781,13 @@ static int get_phy_c45_ids(struct mii_bus *bus, int addr, u32 *phy_id,
>>>>>>>>>>       		ret = _get_phy_id(bus, addr, i, &c45_ids->device_ids[i], true);
>>>>>>>>>>       		if (ret < 0)
>>>>>>>>>>       			return ret;
>>>>>>>>>> +		if (valid_phy_id(c45_ids->device_ids[i]))
>>>>>>>>>> +			valid_id = true;
>>>>>>>>>
>>>>>>>>> Here you are using your "devices in package" validator to validate the
>>>>>>>>> PHY ID value.  One of the things it does is mask this value with
>>>>>>>>> 0x1fffffff.  That means you lose some of the vendor OUI.  To me, this
>>>>>>>>> looks completely wrong.
>>>>>>>>
>>>>>>>> I think in this case I was just using it like the comment in
>>>>>>>> get_phy_device() "if the phy_id is mostly F's, there is no device here".
>>>>>>>>
>>>>>>>> My understanding is that the code is trying to avoid the 0xFFFFFFFF returns
>>>>>>>> that seem to indicate "bus ok, phy didn't respond".
>>>>>>>>
>>>>>>>> I just checked the OUI registration, and while there are a couple OUI's
>>>>>>>> registered that have a number of FFF's in them, none of those cases seems to
>>>>>>>> overlap sufficiently to cause this to throw them out. Plus a phy would also
>>>>>>>> have to have model+revision set to 'F's. So while might be possible, if
>>>>>>>> unlikely, at the moment I think the OUI registration keeps this from being a
>>>>>>>> problem. Particularly, if i'm reading the mapping correctly, the OUI mapping
>>>>>>>> guarantees that the field cannot be all '1's due to the OUI having X & M
>>>>>>>> bits cleared. It sort of looks like the mapping is trying to lose those
>>>>>>>> bits, by tossing bit 1 & 2, but the X & M are in the wrong octet (AFAIK, I
>>>>>>>> just read it three times cause it didn't make any sense).
>>>>>>>
>>>>>>> I should also note that we have at least one supported PHY where one
>>>>>>> of the MMDs returns 0xfffe for even numbered registers and 0x0000 for
>>>>>>> odd numbered registers in one of the vendor MMDs for addresses 0
>>>>>>> through 0xefff - which has a bit set in the devices-in-package.
>>>>>>>
>>>>>>> It also returns 0x0082 for almost every register in MMD 2, but MMD 2's
>>>>>>> devices-in-package bit is clear in most of the valid MMDs, so we
>>>>>>> shouldn't touch it.
>>>>>>>
>>>>>>> These reveal the problem of randomly probing MMDs - they can return
>>>>>>> unexpected values and not be as well behaved as we would like them to
>>>>>>> be.  Using register 8 to detect presence may be beneficial, but that
>>>>>>> may also introduce problems as we haven't used that before (and we
>>>>>>> don't know whether any PHY that wrong.)  I know at least the 88x3310
>>>>>>> gets it right for all except the vendor MMDs, where the low addresses
>>>>>>> appear non-confromant to the 802.3 specs.  Both vendor MMDs are
>>>>>>> definitely implemented, just not with anything conforming to 802.3.
>>>>>>
>>>>>> Yes, we know even for the NXP reference hardware, one of the phy's doesn't
>>>>>> probe out correctly because it doesn't respond to the ieee defined
>>>>>> registers. I think at this point, there really isn't anything we can do
>>>>>> about that unless we involve the (ACPI) firmware in currently nonstandard
>>>>>> behaviors.
>>>>>>
>>>>>> So, my goals here have been to first, not break anything, and then do a
>>>>>> slightly better job finding phy's that are (mostly?) responding correctly to
>>>>>> the 802.3 spec. So we can say "if you hardware is ACPI conformant, and you
>>>>>> have IEEE conformant phy's you should be ok". So, for your example phy, I
>>>>>> guess the immediate answer is "use DT" or "find a conformant phy", or even
>>>>>> "abstract it in the firmware and use a mailbox interface".
>>>>>
>>>>> You haven't understood.  The PHY does conform for most of the MMDs,
>>>>> but there are a number that do not conform.
>>>>
>>>> Probably...
>>>>
>>>> Except that i'm not sure how that is a problem at the moment, its still
>>>> going to trigger as a found phy, and walk the same mmd list as before
>>>> requesting drivers. Those drivers haven't changed their behavior so where is
>>>> the problem? If there is a problem its in 7/11 where things are getting
>>>> kicked due to seemingly invalid Ids.
>>>>
>>>> The 1/11 devices=0 case actually appears to be a bug i'm fixing because you
>>>> won't get an ID or a MMD list from that (before or after).
>>>
>>> I think I've just flattened that argument in my immediately preceding
>>> reply on the Cortina situation; I think you've grossly misread that
>>> through not fully researching the history and then finding the
>>> existing users.
>>>
>>> There is no bug that you are fixing from what I can see.
>>
>> One of us is missing something,
>>
>> The "cortina" solution is broken in the current kernel. That is because
>> lines 726-742 are dead code due to line 693.
>>
>> I believe I've understood the problem there, and corrected it in this set
>> along with a few others, but its distinctly possible that isn't true.
> 
> The code you refer to above is NOT used on the platforms that I have
> identified use the Cortina PHY.  If this code is not used, it has not
> caused any issue, and there is no breakage due to the change you are
> referring to.
> 
Right, which is what I sort of expected. Because its falling back to a 
device list of 0xFFFFFFFF, which means probe every single MMD.

Combined with the lack of filtering means that your getting a bunch of 
MMD IDs that potentially are invalid, along with any that happen to be 
valid. Its that behavior (and some others) which were what blew this set 
up from a couple lines of tweaks into this mess.

I don't really see a way to guess at all the "wrong" ids that are being 
pushed into the system. Which is why I started to think about a "strict" 
mode later in the set. Maybe at this point the only way around some of 
these bugs/side effects/etc is just a second c45 probe routine if we 
don't think its possible to implement a variable strictness scanner in 
this code path without losing phys that previously were detected.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ