netdev - Re: [RFC net-next PATCH 16/16] net: sfp: Add quirk to ignore PHYs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55f6cec4-2497-45a4-cb1a-3edafa7d80d3@seco.com>
Date:   Tue, 5 Oct 2021 16:38:23 -0400
From:   Sean Anderson <sean.anderson@...o.com>
To:     "Russell King (Oracle)" <linux@...linux.org.uk>
Cc:     netdev@...r.kernel.org, "David S . Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>, linux-kernel@...r.kernel.org,
        Andrew Lunn <andrew@...n.ch>,
        Heiner Kallweit <hkallweit1@...il.com>
Subject: Re: [RFC net-next PATCH 16/16] net: sfp: Add quirk to ignore PHYs



On 10/5/21 3:12 PM, Russell King (Oracle) wrote:
> On Tue, Oct 05, 2021 at 12:45:28PM -0400, Sean Anderson wrote:
>>
>>
>> On 10/5/21 6:33 AM, Russell King (Oracle) wrote:
>> > On Mon, Oct 04, 2021 at 03:15:27PM -0400, Sean Anderson wrote:
>> > > Some modules have something at SFP_PHY_ADDR which isn't a PHY. If we try to
>> > > probe it, we might attach genphy anyway if addresses 2 and 3 return
>> > > something other than all 1s. To avoid this, add a quirk for these modules
>> > > so that we do not probe their PHY.
>> > >
>> > > The particular module in this case is a Finisar SFP-GB-GE-T. This module is
>> > > also worked around in xgbe_phy_finisar_phy_quirks() by setting the support
>> > > manually. However, I do not believe that it has a PHY in the first place:
>> > >
>> > > $ i2cdump -y -r 0-31 $BUS 0x56 w
>> > >      0,8  1,9  2,a  3,b  4,c  5,d  6,e  7,f
>> > > 00: ff01 ff01 ff01 c20c 010c 01c0 0f00 0120
>> > > 08: fc48 000e ff78 0000 0000 0000 0000 00f0
>> > > 10: 7800 00bc 0000 401c 680c 0300 0000 0000
>> > > 18: ff41 0000 0a00 8890 0000 0000 0000 0000
>> >
>> > Actually, I think that is a PHY. It's byteswapped (which is normal using
>> > i2cdump in this way).The real contents of the registers are:
>> >
>> > 00: 01ff 01ff 01ff 0cc2 0c01 c001 000f 2001
>> > 08: 48fc 0e00 78ff 0000 0000 0000 0000 f000
>> > 10: 0078 bc00 0000 1c40 0c68 0003 0000 0000
>> > 18: 41ff 0000 000a 9088 0000 0000 0000 0000
>>
>> Ah, thanks for catching this.
>>
>> > It's advertising pause + asym pause, 1000BASE-T FD, link partner is also
>> > advertising 1000BASE-T FD but no pause abilities.
>> >
>> > When comparing this with a Marvell 88e1111:
>> >
>> > 00: 1140 7949 0141 0cc2 05e1 0000 0004 2001
>> > 08: 0000 0e00 4000 0000 0000 0000 0000 f000
>> > 10: 0078 8100 0000 0040 0568 0000 0000 0000
>> > 18: 4100 0000 0002 8084 0000 0000 0000 0000
>> >
>> > It looks remarkably similar. However, The first few reads seem to be
>> > corrupted with 0x01ff. It may be that the module is slow to allow the
>> > PHY to start responding - we've had similar with Champion One SFPs.
>>
>> Do you have an an example of how to work around this? Even reading one
>> register at a time I still get the bogus 0x01ff. Reading bytewise, a
>> reasonable-looking upper byte is returned every other read, but the
>> lower byte is 0xff every time.
>
> I think the Champion One modules just don't respond to the I2C
> transactions, so we keep retrying for a while. We try every
> 50ms for 12 retries, which seems to be long enough for their
> modules.
>
>> > It looks like it's a Marvell 88e1111. The register at 0x11 is the
>> > Marvell status register, and 0xbc00 indicates 1000Mbit, FD, AN
>> > resolved, link up which agrees with what's in the various other
>> > registers.
>>
>> That matches some supplemental info on the manufacturer's website
>> (which was frustratingly not associated with the model number of
>> this particular module).
>
> The interesting thing is, many modules use 88e1111, which is about
> the only PHY that I'm aware that supports I2C access mode natively.
> So, it's really surprising that you're getting corrupted data,
> unless...
>
> There's been a history of using too strong pull-ups on the SFP I2C
> lines. The SFP MSA gives a minimum value of the resistors (4.7k).
> SFP+ lowers the minimum value and raises the maximum clock frequency.
> Some SFP modules are unable to drive the I2C bus low against the
> lower resistances resulting in corrupted data (or worse, it can
> corrupt the EEPROMs.)

There is a level shifter. Between the shifter and the SoC there were
1.8k (!) pull-ups, and between the shifter and the SFP there were 10k
pull-ups. I tried replacing the pull-ups between the SoC and the shifter
with 10k pull-ups, but noticed no difference. I have also noticed no
issues accessing the EEPROM, and I have not noticed any difference
accessing other registers (see below). Additionally, this same error is
"present" already in xgbe_phy_finisar_phy_quirks(), as noted in the
commit message.

> Other problems on some platforms have been with I2C level shifters
> locking up, but that doesn't look like what's happening here - they
> lockup at logic low not logic high. Even so-called "impossible to
> lockup" level shifters have locked up despite their manufacturer
> stating that it is impossible.
>
> Is it always the same addresses?

Yes.

> What if you read from a different offset?

Same thing.

> What if you re-read after it seems to have cleared?

Here are some various transfers which hopefully will clarify the
behavior:

First, reading two bytes at a time
	$ i2ctransfer -y 2 w1@...6 2 r2
	0x01 0xff
This behavior is repeatable
	$ i2ctransfer -y 2 w1@...6 2 r2
	0x01 0xff
Now, reading one byte at a time
	$ i2ctransfer -y 2 w1@...6 2 r1
	0x01
A second write/single read gets us the first byte again.
	$ i2ctransfer -y 2 w1@...6 2 r1
	0x41
And doing it for a third time gets us the first byte again.
	$ i2ctransfer -y 2 w1@...6 2 r1
	0x01
If we start another one-byte read without writing the address, we get
the second byte
	$ i2ctransfer -y 2 r1@...6
	0x41
And continuing this pattern, we get the next byte.
	$ i2ctransfer -y 2 r1@...6
	0x0c
This can be repeated indefinitely
	$ i2ctransfer -y 2 r1@...6
	0xc2
	$ i2ctransfer -y 2 r1@...6
	0x0c
But stopping in the "middle" of a register fails
	$ i2ctransfer -y 2 w1@...6 2 r1
	Error: Sending messages failed: Input/output error
We don't have to immediately read a byte:
	$ i2ctransfer -y 2 w1@...6 2
	$ i2ctransfer -y 2 r1@...6
	0x01
	$ i2ctransfer -y 2 r1@...6
	0x41
We can read two bytes indefinitely after "priming the pump"
	$ i2ctransfer -y 2 w1@...6 2 r1
	0x01
	$ i2ctransfer -y 2 r1@...6
	0x41
	$ i2ctransfer -y 2 r2@...6
	0x0c 0xc2
	$ i2ctransfer -y 2 r2@...6
	0x0c 0x01
	$ i2ctransfer -y 2 r2@...6
	0x00 0x00
	$ i2ctransfer -y 2 r2@...6
	0x00 0x04
	$ i2ctransfer -y 2 r2@...6
	0x20 0x01
	$ i2ctransfer -y 2 r2@...6
	0x00 0x00
But more than that "runs out"
	$ i2ctransfer -y 2 w1@...6 2 r1
	0x01
	$ i2ctransfer -y 2 r1@...6
	0x41
	$ i2ctransfer -y 2 r4@...6
	0x0c 0xc2 0x0c 0x01
	$ i2ctransfer -y 2 r4@...6
	0x00 0x00 0x00 0x04
	$ i2ctransfer -y 2 r4@...6
	0x20 0x01 0xff 0xff
	$ i2ctransfer -y 2 r4@...6
	0x01 0xff 0xff 0xff
However, the above multi-byte reads only works when starting at register
2 or greater.
	$ i2ctransfer -y 2 w1@...6 0 r1
	0x01
	$ i2ctransfer -y 2 r1@...6
	0x40
	$ i2ctransfer -y 2 r2@...6
	0x01 0xff

Based on the above session, I believe that it may be best to treat this
phy as having an autoincrementing register address which must be read
one byte at a time, in multiples of two bytes. I think that existing SFP
phys may compatible with this, but unfortunately I do not have any on
hand to test with.

--Sean