lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <mzlf5spdsfk5puw4ic7wuvstbgydeqz3p2f73bhroxqbsj5h7q@7jn6r2xnttsk>
Date: Sun, 4 Jun 2023 19:56:27 +0000
From: Alvin Šipraga <ALSI@...g-olufsen.dk>
To: Christian Lamparter <chunkeey@...il.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, "luizluca@...il.com"
	<luizluca@...il.com>, "linus.walleij@...aro.org" <linus.walleij@...aro.org>,
	"andrew@...n.ch" <andrew@...n.ch>, "olteanv@...il.com" <olteanv@...il.com>,
	"f.fainelli@...il.com" <f.fainelli@...il.com>
Subject: Re: [PATCH v1] net: dsa: realtek: rtl8365mb: use mdio passthrough to
 access PHYs

On Sun, Jun 04, 2023 at 04:42:58PM +0200, Christian Lamparter wrote:
> On 6/4/23 14:03, Alvin Šipraga wrote:
> > On Fri, Jun 02, 2023 at 07:02:31PM +0200, Christian Lamparter wrote:
> > > when bringing up the PHYs on a Netgear WNDAP660, I observed that
> > > none of the PHYs are getting enumerated and the rtl8365mb fails
> > > to load.
> > > 
> > > | realtek-mdio [...] lan1 (unini...): validation of gmii with support \
> > > |   0...,0.,..6280 and advertisement 0...,0...,6280 failed: -EINVAL
> > > | realtek-mdio [...] lan1 (uninit...): failed to connect to PHY: -EINVAL
> > > | realtek-mdio [...] lan1 (uninit...): error -22 setting up PHY for
> > > |   tree 0, switch 0, port 0
> > > 
> > > with phytool, all registers just returned "0000".
> > > 
> > > Now, the same behavior was present with the swconfig version of
> > > rtl8637b.c and in the device's uboot the "mii" register access
> > > utility also reports bogus values.
> > 
> > Not really relevant...
> 
> Oh, maybe I should be blunt here. This is the first time that I
> got proper mii values for this RTL8363. This is revevant, because
> in u-boot the vendor (Netgear) usually takes care of the "mii" tool.
> 
> But this wasn't the case here, I'm not sure if this RTL8363SB is
> an odd-ball or not. This patch was meant for discussion, if the
> discussion is fruitful, I fully expect to do a v2..v3 with the
> information that was gathered during review.

No problem, but I think it's not that useful in the final commit message. Just
my 2c...

> 
> > > 
> > > The Netgear WNDAP660 might be somewhat special, since the RTL8363SB
> > > uses exclusive MDC/MDIO-access (instead of SMI). (And the RTL8363SB
> > > is not part of the supported list of this driver).
> > 
> > We had other MDIO switches with support added, so I don't think it's unique.
> > 
> > > 
> > > Since this was all hopeless, I dug up some datasheet when searching
> > > for solutions:
> > > "10/100M & 10/100/1000M Switch Controller Programming Guide".
> > > It had an interesting passage that pointed to a magical
> > > MDC_MDIO_OPERATION define which resulted in different slave PHY
> > > access for the MDIO than it was implemented for SMI.
> > 
> > Got a reference? I do not see MDC_MDIO_OPERATION in your patch.
> 
> Oh, I overwrote the current rtl8365mb_dsa_phy_write and
> rtl8365mb_mdio_phy_read to match what's I found in ASUS WRT's codebase.
> From what I gathered, this mdc-mdio was tested/developped with an RT-AC88U.
> So I think you all are no stranger to the ASUSWRT Merlin Project @
> https://www.asuswrt-merlin.net/. Thankfully they have a link to their
> github and provide the realtek source code which includes the phy
> access routines in:
> 
> https://github.com/RMerl/asuswrt-merlin.ng/blob/master/release/src-rt-6.x.4708/linux/linux-2.6.36/drivers/char/rtl8365mb/rtl8367c_asicdrv_phy.c
> 
> as rtl8367c_setAsicPHYReg and rtl8367c_getAsicPHYReg. (look in line 21 for
> the #if defined(MDC_MDIO_OPERATION) until the #else 188). These two are
> what I implemented in rtl8365mb_mdio_phy_write and rtl8365mb_mdio_phy_read.

Hmm, so the sources I think you should follow are here (also GPL'ed):

https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/mediatek/files/drivers/net/phy/rtk/rtl8367c/rtl8367c_asicdrv_phy.c;h=fb4db113a9870a3aaa785b8765fa5a341a407c3b;hb=HEAD

... and I guess you want to just implement an alternative
rtl8365mb_phy_ocp_{read,write} based on rtl8367c_{get,set}AsicPHYOCPReg when
MDC_MDIO_OPERATION is defined. As I see it, there are a few differences. Could
you try with this approach and let me know if it works? This will be more
consistent as the driver will remain based on the same set of sources.

> 
> > > 
> > > With this implemented, the RTL8363SB PHYs came to live:
> > > 
> > > | [...]: found an RTL8363SB-CG switch
> > > | [...]: missing child interrupt-controller node
> > > | [...]: no interrupt support
> > > | [...]: configuring for fixed/rgmii link mode
> > > | [...] lan1 (uninit...): PHY [dsa-0.0:01] driver [Generic PHY] (irq=POLL)
> > > | [...] lan2 (uninit...): PHY [dsa-0.0:02] driver [Generic PHY] (irq=POLL)
> > > | device eth0 entered promiscuous mode
> > > | DSA: tree 0 setup
> > > | realtek-mdio 4ef600c00.ethernet:00: Link is Up - 1Gbps/Full - [...]
> > > 
> > > | # phytool lan1/2
> > > | ieee-phy: id:0x001cc980 <--- this is correct!!
> > > |
> > > |  ieee-phy: reg:BMCR(0x00) val:0x1140
> > > |     flags:          -reset -loopback +aneg-enable -power-down
> > > |		      -isolate -aneg-restart -collision-test
> > > |     speed:          1000-full
> > > |
> > > |  ieee-phy: reg:BMSR(0x01) val:0x7969
> > > |     capabilities:   -100-b4 +100-f +100-h +10-f +10-h -100-t2-f
> > > |		      -100-t2-h
> > > |      flags:         +ext-status +aneg-complete -remote-fault
> > > |		      +aneg-capable -link -jabber +ext-register
> > > 
> > > the port statistics are working too and the exported LED triggers.
> > > But so far I can't get any traffic to pass.
> > 
> > This info is also not entirely relevant in a commit message, but thanks for
> > clarifying.
> 
> True :) and it was a pain to format. Still I'm hoping to get confirmation
> about 0x001cc980-ish PHYID. This is something all switches should produce and
> this should be repoduceable by others (with slightly different IDs).
> The source for the phytool: https://github.com/wkz/phytool )

0x001cc942 is what my (SMI-connected) RTL8365MB-VC-CG reports:

| # phytool swp0/0
| ieee-phy: id:0x001cc942
| 
|    ieee-phy: reg:BMCR(0x00) val:0x1140
|       flags:          -reset -loopback +aneg-enable -power-down -isolate -aneg-restart -collision-test
|       speed:          1000-full
| 
|    ieee-phy: reg:BMSR(0x01) val:0x79c9
|       capabilities:   -100-b4 +100-f +100-h +10-f +10-h -100-t2-f -100-t2-h
|       flags:          +ext-status -aneg-complete -remote-fault +aneg-capable -link -jabber +ext-register

Since your switch reports something else, you will want to update
drivers/net/phy/realtek.c if you want to have the chip deliver
interrupts. Otherwise you end up with the Generic PHY driver polling for PHY
status.

> 
> > > Signed-off-by: Christian Lamparter <chunkeey@...il.com>
> > > ---
> > > Any good hints or comments? Is the RTL8363SB an odd one here and
> > > everybody else can just use SMI?
> > 
> > Luiz implemented MDIO support presumably because he could not use SMI. I think
> > at the very least, he or somebody else should test your patch on all existing
> > MDIO-wired chips supported by the driver. Since this could cause regressions. If
> > that is not possible but you would still like to support your new switch, maybe
> > we need both implementations available.
> 
> Yes, I'm fully expecting these comments.

Cool, let's wait for him to reply/test as well.

> 
> > 
> > Regarding your patch, I do not fully understand it. I think you could perhaps
> > explain it a little more clearly. Basically I see that you are provisioning the
> > GPHY_OCP_MSB_0_REG a little bit differently, and then executing a register read
> > at some particular offset. Contrast this with the current indirect access
> > command register followed by polling. I think yours is a better approach, since
> > it is more direct, but only if it works and is well documented.
> 
> There are sadly the only useful comment in the vendor driver is "Default OCP Address"
> with the associated 0x29 "Magic value". This 0x29 becomes 0xA400 if you put it through
> the FIELD_PREP... And funnily enough, that exactly matches the existing
> "RTL8365MB_PHY_OCP_ADDR_PHYREG_BASE" which is used by the current accessors (but they
> do much much more).

Please compare with the link I sent above and tell me what you think.

> 
> It looks like the person that wrote the rtl8365mb_phy_ocp_read and
> rtl8365mb_phy_ocp_write() could answer in detail what's behind this.
> I'm sure this would be on a datasheet, but sadly I don't have one for the switch.

That would be me, but I just followed what was done in the vendor sources... The
registers are a bit of a mess and I don't know what OCP stands for, so I am not
much better informed than anybody else. I would just like to keep things more or
less consistent, i.e. make the code look something like:

  phy_read() {
    /* prep the OCP_MSB_0_REG */
  
    if (mdio)
      /* just read from the suitable register */
    else
      /* do the INDIRECT_ACCESS_*_REG write/poll/read dance */
  
    return val;
  }

... somewhat matching the two different access methods in the vendor sources I
linked.

> 
> > Perhaps you can give a pointer to which logic in the vendor driver you followed
> > in order to achieve this more direct register access without polling. This will
> > help me review it :)
> > 
> > But since you still haven't got data through your switch, I am a bit reluctant
> > to approve this kind of change. I would prefer to see a full series adding the
> > support, so that this kind of change/quirk is justfieid. Otherwise it is just
> > introducing potential regressions with no real benefit. I hope you understand.
> 
> Oh, this is no longer the case. I have it sort of working now. The "no traffic"
> issue was "fixed" by the second patch
> "net: dsa: realtek: rtl8365mb: add missing case for digital interface 0".
> 
> Though, It's not 100% finished. "Normal data" works, but it seems the switch
> doesn't like that I have multiple other VLANs on the same network (including
> non-vlan traffic). I see a constant stream of "non-realtek ethertype ..."
> message from tag_rtl8_4.c. These include 802.1Q (0x8100) EtherTypes,
> IPv4 (0x0800) and IPv6 (86DD)). Though, I'm optimistic that this can be solved.

Could be your MAC doing something weird, but I'm not sure. You could try using
the trailing tag instead of the leading tag. Luiz or Arınç may also be able to
help you.

Kind regards,
Alvin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ