lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 23 Oct 2022 17:05:30 +0200
From:   Frank Wunderlich <frank-w@...lic-files.de>
To:     "Russell King (Oracle)" <linux@...linux.org.uk>
Cc:     Frank Wunderlich <linux@...web.de>,
        linux-mediatek@...ts.infradead.org,
        Alexander Couzens <lynxis@...0.eu>,
        Felix Fietkau <nbd@....name>, John Crispin <john@...ozen.org>,
        Sean Wang <sean.wang@...iatek.com>,
        Mark Lee <Mark-MC.Lee@...iatek.com>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Matthias Brugger <matthias.bgg@...il.com>,
        netdev@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Aw: Re: Re: Re: Re: Re: Re: [PATCH v2] net: mtk_sgmii: implement
 mtk_pcs_ops

> Gesendet: Sonntag, 23. Oktober 2022 um 11:43 Uhr
> Von: "Russell King (Oracle)" <linux@...linux.org.uk>

> On Sun, Oct 23, 2022 at 09:26:39AM +0200, Frank Wunderlich wrote:
> > > Gesendet: Samstag, 22. Oktober 2022 um 21:18 Uhr
> > > Von: "Russell King (Oracle)" <linux@...linux.org.uk>
> > > Hi,
> > >
> > > On Sat, Oct 22, 2022 at 07:53:16PM +0200, Frank Wunderlich wrote:
> > > > > Gesendet: Samstag, 22. Oktober 2022 um 19:05 Uhr
> > > > > Von: "Russell King (Oracle)" <linux@...linux.org.uk>
> > > > > On Sat, Oct 22, 2022 at 12:52:00PM +0200, Frank Wunderlich wrote:
> > > > > > > Gesendet: Samstag, 22. Oktober 2022 um 11:11 Uhr
> > > > > > > Von: "Russell King (Oracle)" <linux@...linux.org.uk>
> > > >
> > > > > > this patch breaks connectivity at least on the sfp-port (eth1).
> > > >
> > > > > > pcs_get_state
> > > > > > [   65.522936] offset:0 0x2c1140
> > > > > > [   65.522950] offset:4 0x4d544950
> > > > > > [   65.525914] offset:8 0x40e041a0
> > > > > > [  177.346183] offset:0 0x2c1140
> > > > > > [  177.346202] offset:4 0x4d544950
> > > > > > [  177.349168] offset:8 0x40e041a0
> > > > > > [  177.352477] offset:0 0x2c1140
> > > > > > [  177.356952] offset:4 0x4d544950
> > > > >
> > > > > Hi,
> > > > >
> > > > > Thanks. Well, the results suggest that the register at offset 8 is
> > > > > indeed the advertisement and link-partner advertisement register. So
> > > > > we have a bit of progress and a little more understanding of this
> > > > > hardware.
> > > > >
> > > > > Do you know if your link partner also thinks the link is up?
> > > >
> > > > yes link is up on my switch, cannot enable autoneg for fibre-port, so port is fixed to 1000M/full flowcontrol enabled.
> > > >
> > > > > What I notice is:
> > > > >
> > > > > mtk_soc_eth 15100000.ethernet eth1: Link is Up - 1Gbps/Unknown - flow control off
> > > > >
> > > > > The duplex is "unknown" which means you're not filling in the
> > > > > state->duplex field in your pcs_get_state() function. Given the
> > > > > link parter adverisement is 0x00e0, this means the link partner
> > > > > supports PAUSE, 1000base-X/Half and 1000base-X/Full. The resolution
> > > > > is therefore full duplex, so can we hack that in to your
> > > > > pcs_get_state() so we're getting that right for this testing please?
> > > >
> > > > 0xe0 is bits 5-7 are set (in lower byte from upper word)..which one is for duplex?
> > > >
> > > > so i should set state->duplex/pause based on this value (maybe compare with own caps)?
> > > >
> > > > found a documentation where 5=full,6=half, and bits 7+8 are for pause (symetric/asymetric)
> > > >
> > > > regmap_read(mpcs->regmap, SGMSYS_PCS_CONTROL_1+8, &val);
> > > > partner_advertising = (val & 0x00ff0000) >> 16;
> > >
> > > Not quite :) When we have the link partner's advertisement and the BMSR,
> > > we have a helper function in phylink to do all the gritty work:
> > >
> > > 	regmap_read(mpcs->regmap, SGMSYS_PCS_CONTROL_1, &bm);
> > > 	regmap_read(mpcs->regmap, SGMSYS_PCS_CONTROL_1 + 8, &adv);
> > >
> > > 	phylink_mii_c22_pcs_decode_state(state, bm >> 16, adv >> 16);
> > >
> > > will do all the work for you without having to care about whether
> > > you're operating at 2500base-X, 1000base-X or SGMII mode.
> > >
> > > > > Now, I'm wondering what SGMII_IF_MODE_BIT0 and SGMII_IF_MODE_BIT5 do
> > > > > in the SGMSYS_SGMII_MODE register. Does one of these bits set the
> > > > > format for the 16-bit control word that's used to convey the
> > > > > advertisements. I think the next step would be to play around with
> > > > > these and see what effect setting or clearing these bits has -
> > > > > please can you give that a go?
> > > >
> > > > these is not clear to me...should i blindly set these and how to
> > > > verify what they do?
> > >
> > > Yes please - I don't think anyone knows what they do.
> >
> > i guess BIT0 is the SGMII_EN flag like in other sgmii implementations.
> > Bit5 is "reserved" in all docs i've found....maybe it is related to HSGMII
> > or for 1G vs. 2G5.
>
> "other sgmii implementations" ?

yes i googled for sgmii and found register definition for different vendor...
i don't know if sgmii is a standard for each vendor, afair trgmii was not.

> If this is the SGMII_EN flag, maybe SGMII_IF_MODE_BIT0 should be
> renamed to SGMII_IF_SGMII_EN ? Maybe it needs to be set for SGMII
> and clear for base-X ?
>
> > but how to check what has changed...i guess only the register itself changed
> > and i have to readout another to check whats changed.
> >
> > do we really need these 2 bits? reading/setting duplex/pause from/to the register
> > makes sense, but digging into undocumented bits is much work and we still only guess.
>
> I don't know - I've no idea about this hardware, or what the PCS is,
> and other people over the years I've talked to have said "we're not
> using it, we can't help". The mediatek driver has been somewhat of a
> pain for phylink as a result.
>
> > so i would first want to get sgmii working again and then strip the pause/duplex from it.
>
> I think I'd need more information on your setup - is this dev 0? Are
> you using in-band mode or fixed-link mode?

i only test with dev1 which is the sfp-port/eth1/gmac1...dev0 is the fixed-link to switch-chip.

> I don't think you've updated me with register values for this since
> the patch. With the link timer adjusted back to 1.6ms, that should
> result in it working again, but if not, I think there's some
> possibilities.

sorry for that, have debugged timing and it was wrong because if-condition had not included 1000baseX and 2500baseX. only sgmii

> The addition of SGMII_AN_ENABLE for SGMSYS_PCS_CONTROL_1 could have
> broken your setup if there is no actual in-band signalling, which
> basically means that your firmware description is wrong - and you've
> possibly been led astray by the poor driver implementation.

disabled it, but makes no change.

but i've noticed that timing is wrong

old value: 0x186a0
new value: 0x98968

so it takes the 10000000 and not the 1600000. so it looks like interface-mode is not (yet) SGMII.

debugged it and got 1000baseX (21),in dts i have
phy-mode = "2500base-x";
but SFP only supports 1G so mode 1000baseX is right

set the old value with your calculation, but still not working, also with disabled AN_ENABLE-flag ;(

root@...-r3:~# ip link set eth1 up
[   44.287442] mtk_soc_eth 15100000.ethernet eth1: configuring for inband/1000be
[   44.295902] interface-mode 21 (sgmii:4)
root@...-r3:~# [   44.295907] timer 0x186a0
[   44.352872] offset:0 0x2c1140
[   44.355507] offset:4 0x4d544950
[   44.358462] offset:8 0x40e041a0
[   44.361609] mtk_soc_eth 15100000.ethernet eth1: Link is Up - 1Gbps/Full - flf
[   44.373042] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready

root@...-r3:~# ip a a 192.168.0.19/24 dev eth1
root@...-r3:~# ping 192.168.0.10
PING 192.168.0.10 (192.168.0.10) 56(84) bytes of data.
^C

> Can you confirm that the device on the other end for dev 0 does in
> actual fact use in-band signalling please?
>
> > > If it's interpreting a link partner advertisement of 0x00e0 using
> > > SGMII rules, then it will be looking at bits 11 and 10 for the
> > > speed, both of which are zero, which means 10Mbps - and 1000base-X
> > > doesn't operate at 10Mbps!
> >
> > so maybe this breaks sgmii? have you changed this behaviour with your Patch?
>
> Nope, but not setting the duplex properly is yet another buggy and poor
> quality of implementation that afficts this driver. I've said about
> setting the duplex value when reviewing your patch to add .pcs_get_state
> and I'm disappointed that you seemingly haven't yet corrected it in the
> code you're testing despite those review comments.

sorry, i thought we want to read out the values from registers to set it based on them.

currently i test only with the dev 1 (in-band-managed with 1GBit/s SFP)

[    1.088310] dev: 0 offset:0 0x40140
[    1.088331] dev: 0 offset:4 0x4d544950
[    1.091827] dev: 0 offset:8 0x1
[    1.095607] dev: 1 offset:0 0x81140
[    1.098739] dev: 1 offset:4 0x4d544950
[    1.102214] dev: 1 offset:8 0x1

after bring device up (disabled AN and set duplex to full):

[   34.615926] timer 0x98968
[   34.672888] offset:0 0x2c1140
[   34.675518] offset:4 0x4d544950
[   34.678473] offset:8 0x40e041a0

codebase:

https://github.com/frank-w/BPI-R2-4.14/commits/6.1-r3-sgmii

> If duplex remains as "unknown", then the MAC will be programmed to
> operate in _half_ _duplex_ mode (read mtk_mac_link_up()) which is not
> what you likely want. Many MACs don't support half duplex at 1G speed,
> so it's likely that without setting state->duplex, the result is that
> the MAC hardware is programmed incorrectly.

wonder why it was working with only my patch which had duplex also not set.

> > > So my hunch is that one of those two IF_MODE_BIT{0,5} _might_ change
> > > the way the PCS interprets the control word, but as we don't have
> > > any documentation to go on, only experimentation will answer this
> > > question.

the bits were in offset 0/4/8? are they now different than before?
if yes maybe these break it.

as offset 4 is the phy-id and 8 is the advertisement from local and far
interface i guesss IF_MODE_* is in offset 0.

> > > If these registers are MMIO, you could ensure that you have /dev/mem
> > > access enabled, and use devmem2 to poke at this register which would
> > > probably be quicker than doing a build-boot-test cycle with the
> > > kernel - this is how I do a lot of this kind of discovery when
> > > documentation is lacking.
> > >
> > > > But the timer-change can also break sgmii...

currently this is no more the case as i set the timer to old value
so something else is breaking it.

> > > SGMII mode should be writing the same value to the link timer, but
> > > looking at it now, I see I ended up with one too many zeros on the
> > > 16000000! It should be 1.6ms in nanoseconds, so 1600000. Please
> > > correct for future testing.
> >
> > tried removing 1  zero from the 16000000, but same result.
> > tried also setting duplex with ethtool, but  after read it is still unknown.
>
> Honestly this doesn't surprise me given the poor state of the mtk_sgmii
> code. There's lots that this implementation gets wrong, but I can't fix
> it without either people willing to research and test stuff, or without
> the actual hardware in front of me.

i can test, but i do not fully understand the code nor have the exoerience
to work with devmem2. have used it long time ago but with assistance to read
which register and with expected values.

> This mtk_eth_soc driver has been a right pain for me ever since the
> half-hearted switch-over to use phylink.

i know it is huge as it covers many different SoC. but i'm not able to rewrite it :p

regards Frank

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ