[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87v8xdrjpw.fsf@bang-olufsen.dk>
Date: Thu, 17 Feb 2022 08:16:44 +0000
From: Alvin Šipraga <ALSI@...g-olufsen.dk>
To: Luiz Angelo Daros de Luca <luizluca@...il.com>
CC: Vladimir Oltean <olteanv@...il.com>,
Alvin Šipraga <alvin@...s.dk>,
Linus Walleij <linus.walleij@...aro.org>,
Andrew Lunn <andrew@...n.ch>,
Vivien Didelot <vivien.didelot@...il.com>,
Florian Fainelli <f.fainelli@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Arınç ÜNAL <arinc.unal@...nc9.com>,
Michael Rasmussen <MIR@...g-olufsen.dk>,
"open list:NETWORKING DRIVERS" <netdev@...r.kernel.org>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next 2/2] net: dsa: realtek: rtl8365mb: serialize
indirect PHY register access
Luiz Angelo Daros de Luca <luizluca@...il.com> writes:
> Hi Vladimir,
>
>> This implementation where the indirect PHY access blocks out every other
>> register read and write is only justified if you can prove that you can
>> stuff just about any unrelated register read or write before
>> RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG, and this, in and of itself,
>> will poison what gets read back from RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG.
>
> I was the first one trying to fix this issue reported by Arinç with
> SMP devices. At first I thought it was caused by two parallel indirect
> access reads polling the interface (it was not using interrupts). With
> no lock, they will eventually collide and one reads the result of the
> other one. However, a simple lock over the indirect access didn't
> solve the issue. Alvin tested it much further to isolate that indirect
> register access is messed up by any other register read. The fails
> while polling the interface status or the other test Alvin created
> only manifests in a device with multiple cores and mine is single
> core. I do get something similar in a single core device by reading an
> unused register address but it is hard to blame Realtek when we are
> doing something we were not supposed to do. Anyway, that indicates
> that "reading a register" is not an atomic operation inside the switch
> asic.
I never observed any issue which suggests that switch register reads are
not atomic... I mean, they are (and always have been) protected by the
default regmap lock. So what makes you say this?
I have only seen issues related to PHY register access, please enlighten
us if there are other issues.
>
>> rtl8365mb_mib_counter_read() doesn't seem like a particularly good
>> example to prove this, since it appears to be an indirect access
>> procedure as well. Single register reads or writes would be ideal, like
>> RTL8365MB_CPU_CTRL_REG, artificially inserted into strategic places.
>> Ideally you wouldn't even have a DSA or MDIO or PHY driver running.
>> Just a simple kernel module with access to the regmap, and try to read
>> something known, like the PHY ID of one of the internal PHYs, via an
>> open-coded function. Then add extra regmap accesses and see what
>> corrupts the indirect PHY access procedure.
>
> The MIB might be just another example where the issue happens. It was
> first noticed with a SMP device without interruptions configured. I
> believe it will always fail with that configuration.
As I stated in the last thread, I tested MIB access and the problem did
not manifest itself there.
>
>> Are Realtek aware of this and do they confirm the issue? Sounds like
>> erratum material to me, and a pretty severe one, at that. Alternatively,
>> we may simply not be understanding the hardware architecture, like for
>> example the fact that MIB indirect access and PHY indirect access may
>> share some common bus and must be sequential w.r.t. each other.
>
> The realtek "API/driver" does exactly how the driver was doing. They
> do have a lock/unlock placeholder, but only in the equivalent
> regmap_{read,write} functions. Indirect access does not use locks at
> all (in fact, there is no other mention of "lock" elsewhere), even
> being obvious that it is not thread-safe. It was just with a DSA
> driver that we started to exercise register access for real, specially
> without interruptions. And even in that case, we could only notice
> this issue in multicore devices. I believe that, if they know about
> this issue, they might not be worried because it has never affected a
> real device. It would be very interesting to hear from Realtek but I
> do not have the contacts.
This is not true, at least with the sources I am reading. As I said in
my reply to Vladimir, the Realtek code takes a lock around each
top-level API call. Example:
rtk_api_ret_t rtk_port_phyStatus_get(...)
{
rtk_api_ret_t retVal;
if (NULL == RT_MAPPER->port_phyStatus_get)
return RT_ERR_DRIVER_NOT_FOUND;
RTK_API_LOCK();
retVal = RT_MAPPER->port_phyStatus_get(port, pLinkStatus, pSpeed, pDuplex);
RTK_API_UNLOCK();
return retVal;
}
Deep down in this port_phyStatus_get() callback, the indirect PHY
register access takes place.
Kind regards,
Alvin
Powered by blists - more mailing lists