[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJq09z6XBQUTBZoQ81Vy3nUc_5QGTF0GH8V-S3bXOw=JpYODvA@mail.gmail.com>
Date: Thu, 17 Feb 2022 00:01:45 -0300
From: Luiz Angelo Daros de Luca <luizluca@...il.com>
To: Vladimir Oltean <olteanv@...il.com>
Cc: Alvin Šipraga <alvin@...s.dk>,
Linus Walleij <linus.walleij@...aro.org>,
Andrew Lunn <andrew@...n.ch>,
Vivien Didelot <vivien.didelot@...il.com>,
Florian Fainelli <f.fainelli@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Alvin Šipraga <alsi@...g-olufsen.dk>,
Arınç ÜNAL <arinc.unal@...nc9.com>,
Michael Rasmussen <mir@...g-olufsen.dk>,
"open list:NETWORKING DRIVERS" <netdev@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next 2/2] net: dsa: realtek: rtl8365mb: serialize
indirect PHY register access
Hi Vladimir,
> This implementation where the indirect PHY access blocks out every other
> register read and write is only justified if you can prove that you can
> stuff just about any unrelated register read or write before
> RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG, and this, in and of itself,
> will poison what gets read back from RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG.
I was the first one trying to fix this issue reported by Arinç with
SMP devices. At first I thought it was caused by two parallel indirect
access reads polling the interface (it was not using interrupts). With
no lock, they will eventually collide and one reads the result of the
other one. However, a simple lock over the indirect access didn't
solve the issue. Alvin tested it much further to isolate that indirect
register access is messed up by any other register read. The fails
while polling the interface status or the other test Alvin created
only manifests in a device with multiple cores and mine is single
core. I do get something similar in a single core device by reading an
unused register address but it is hard to blame Realtek when we are
doing something we were not supposed to do. Anyway, that indicates
that "reading a register" is not an atomic operation inside the switch
asic.
> rtl8365mb_mib_counter_read() doesn't seem like a particularly good
> example to prove this, since it appears to be an indirect access
> procedure as well. Single register reads or writes would be ideal, like
> RTL8365MB_CPU_CTRL_REG, artificially inserted into strategic places.
> Ideally you wouldn't even have a DSA or MDIO or PHY driver running.
> Just a simple kernel module with access to the regmap, and try to read
> something known, like the PHY ID of one of the internal PHYs, via an
> open-coded function. Then add extra regmap accesses and see what
> corrupts the indirect PHY access procedure.
The MIB might be just another example where the issue happens. It was
first noticed with a SMP device without interruptions configured. I
believe it will always fail with that configuration.
> Are Realtek aware of this and do they confirm the issue? Sounds like
> erratum material to me, and a pretty severe one, at that. Alternatively,
> we may simply not be understanding the hardware architecture, like for
> example the fact that MIB indirect access and PHY indirect access may
> share some common bus and must be sequential w.r.t. each other.
The realtek "API/driver" does exactly how the driver was doing. They
do have a lock/unlock placeholder, but only in the equivalent
regmap_{read,write} functions. Indirect access does not use locks at
all (in fact, there is no other mention of "lock" elsewhere), even
being obvious that it is not thread-safe. It was just with a DSA
driver that we started to exercise register access for real, specially
without interruptions. And even in that case, we could only notice
this issue in multicore devices. I believe that, if they know about
this issue, they might not be worried because it has never affected a
real device. It would be very interesting to hear from Realtek but I
do not have the contacts.
Regards,
Luiz
Powered by blists - more mailing lists