netdev - Re: net: dsa: realtek: silent indirect reg read failures on SMP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJq09z71Fi8rLkQUPR=Ov6e_99jDujjKBfvBSZW0M+gTWK-ToA@mail.gmail.com>
Date:   Wed, 9 Feb 2022 20:28:08 -0300
From:   Luiz Angelo Daros de Luca <luizluca@...il.com>
To:     Alvin Šipraga <ALSI@...g-olufsen.dk>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Linus Walleij <linus.walleij@...aro.org>,
        Arınç ÜNAL <arinc.unal@...nc9.com>
Subject: Re: net: dsa: realtek: silent indirect reg read failures on SMP

'

> Hi Luiz,

Hi Alvin,

> Thanks very much for the bug report!

Thanks Alvin!

> > Arinç reported an issue with interfaces going down and up in a fixed
> > interval (a little less than 40s). It happened with both SMI or MDIO
> > interfaces for two different rtl8365mb supported models.
>
> Is this using the Generic PHY driver or the Realtek RTL8365MB-VC PHY
> driver? If it's the latter then I would be interested to know whether
> the interrupt counters are going up too. Maybe the switch really has
> something to say?

Both are using 'Realtek RTL8365MB-VC PHY' but without interruptions.
In the mdio-connected device, the interrupt pin is not connected yet
(I suggested them to use it).
The other one is unknown if the pin is connected to a gpio port. I
suggested Arunç to export all available GPIO ports and check their
values with the interruption configured as both active high and low.
That's how I found the interrupt gpio in my device.
So, all affected SMP devices are using polling.

> I think the Generic PHY driver is based on polling of PHY registers,
> which seems more likely to trigger the issue you describe.

I didn't even know that 'Generic PHY' would work until yesterday when
I noticed I was missing the patch that added the phy_id.
Anyway, as I said, both drivers will work with polling while only
'RTL8365MB-VC PHY' can handle interruptions. It is not a good
indicator
whether it is using interruption or not. And MDIO also didn't support
interruption without a new patch I just sent.

https://patchwork.kernel.org/project/netdevbpf/patch/20220209224538.9028-1-luizluca@gmail.com/

> In any case, I think I can reproduce your issue. Using an SMI setup with
> just one cable attached to port 2, I did the following:
>
> 1. Enable ftrace event logging for regmap and mdio
> 2. Poll BSMR PHY register for my connected port; it should always read
>    the same (0x79ed).
> 3. Wait for 2 to read out something else.
>
> So I used this command:
>
>     while true; do phytool read swp2/2/0x01; sleep 0.1; done

I'll test it and report.

>
> After a couple of seconds I already read out 0x0000. And after checking
> the ftrace log I can see that this is because of some interleaved
> register access:
>
>      kworker/3:4-70      [003] .......  1927.139849: regmap_reg_write: ethernet-switch reg=1004 val=bd
>          phytool-16816   [002] .......  1927.139979: regmap_reg_read: ethernet-switch reg=1f01 val=0
>      kworker/3:4-70      [003] .......  1927.140381: regmap_reg_read: ethernet-switch reg=1005 val=0
>          phytool-16816   [002] .......  1927.140468: regmap_reg_read: ethernet-switch reg=1d15 val=a69
>      kworker/3:4-70      [003] .......  1927.140864: regmap_reg_read: ethernet-switch reg=1003 val=0
>          phytool-16816   [002] .......  1927.140955: regmap_reg_write: ethernet-switch reg=1f02 val=2041
>      kworker/3:4-70      [003] .......  1927.141390: regmap_reg_read: ethernet-switch reg=1002 val=0
>          phytool-16816   [002] .......  1927.141479: regmap_reg_write: ethernet-switch reg=1f00 val=1
>      kworker/3:4-70      [003] .......  1927.142311: regmap_reg_write: ethernet-switch reg=1004 val=be
>          phytool-16816   [002] .......  1927.142410: regmap_reg_read: ethernet-switch reg=1f01 val=0
>      kworker/3:4-70      [003] .......  1927.142534: regmap_reg_read: ethernet-switch reg=1005 val=0
>          phytool-16816   [002] .......  1927.142618: regmap_reg_read: ethernet-switch reg=1f04 val=0
>          phytool-16816   [002] .......  1927.142641: mdio_access: SMI-0 read  phy:0x02 reg:0x01 val:0x0000
>      kworker/3:4-70      [003] .......  1927.143037: regmap_reg_read: ethernet-switch reg=1001 val=0
>      kworker/3:4-70      [003] .......  1927.143133: regmap_reg_read: ethernet-switch reg=1000 val=2d89
>      kworker/3:4-70      [003] .......  1927.143213: regmap_reg_write: ethernet-switch reg=1004 val=be
>      kworker/3:4-70      [003] .......  1927.143291: regmap_reg_read: ethernet-switch reg=1005 val=0
>      kworker/3:4-70      [003] .......  1927.143368: regmap_reg_read: ethernet-switch reg=1003 val=0
>      kworker/3:4-70      [003] .......  1927.143443: regmap_reg_read: ethernet-switch reg=1002 val=6
>
> The kworker here is polling MIB counters for stats, as evidenced by the
> register 0x1004 that we are writing to (RTL8365MB_MIB_ADDRESS_REG).

At first I thought it was just two threads reading different indirect
registers. And there are really more than one kernel threads reading
port status in a SMP device.
If you disable the extra cores, it was working as expected. So, I
added a mutex to serialize that sequence. But it didn't fix the issue.
It looks like something deep down is broken.

> Sometimes I can even read out completely junk values if I tighten my
> phytool loop:
>
>     # while true; do phytool read swp2/2/0x01; done | uniq
>     0x79ed
>     0000
>     0x79ed
>     0000
>     0x79ed
>     0000
>     0x79ed
>     0x0014 <-- ?!

Yeah, just like what happens if you read that unused register. You
generally get 0x0000 instead of the good data but sometimes random
data.
I also added multiple udelay over and inside the regmap_read with no
success (while reading the unused register). I might test again with
phytool.

> This is just a preliminary investigation - I only checked your mail
> today - but I will follow up on this next week and try and make a
> fix. In the mean time could you and Arinç perhaps do some similar
> tracing and confirm that in your case it is a similar issue? You needn't
> use phytool if the issue reproduces itself every 40 seconds - in that
> case just enable ftrace events and inspect the register access sequence
> when a PHY read gives an unexpected value.

I'll try to enable ftrace but when I added a couple of printk, the
issue was gone.
Let's see what my device can handle.

The 40s is not a global cycle when everything falls apart. I added a
simple test that printed an error when the expected value wasn't
right. And each time a different port was failing in a fixed but
unsynchronized rhythm. Even with disconnected ports.
It looks like the switch "internal job queue" overflows and we get the
null or random data. And that gives the switch a little time to
process the queue. After that, the queue fills up again and we have
the same behavior. Accessing the unused register might simply consume
a lot of "extra loops". It looks like an exaggerated variant of the
same issue. Or not.

> Let's please restrict the investigation to registers which we expect to
> give well-defined values though - unused or otherwise inactive registers
> may very well give us junk, but it is harder to argue that this is a
> bug. Anyway, this seems pretty easy to reproduce, so I don't think you
> need to go down that path :-)

I also believe that we need to stick to registers that should return
good values. But while the indirect registers are failing, you can
also dump
some other switch registers and it should also return some junk data.
I didn't test it but I believe that's what will happen.

> Overall we probably need some stricter locking but I want to spend some
> time on this before just sending a big-fat-lock solution to the
> list. I didn't try your patch yet due to other items in my inbox today.

If nothing serializes the access to the indirect registers, the lock
is needed. Your traces show that two reads will collide.
The regmap also has its own lock by default and I also lock the mdio
during a regmap_read/write. I don't believe an extra lock will do.
I guess we'll need to slow the regmap access, although it wasn't enough for me.

Maybe the switch is yelling an interruption we are ignoring. This is
another interesting thing to check.

---
Luiz