[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871qzwjmtv.fsf@bang-olufsen.dk>
Date: Mon, 21 Feb 2022 14:50:24 +0000
From: Alvin Šipraga <ALSI@...g-olufsen.dk>
To: Vladimir Oltean <olteanv@...il.com>
CC: Alvin Šipraga <alvin@...s.dk>,
Linus Walleij <linus.walleij@...aro.org>,
Andrew Lunn <andrew@...n.ch>,
Vivien Didelot <vivien.didelot@...il.com>,
Florian Fainelli <f.fainelli@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Luiz Angelo Daros de Luca <luizluca@...il.com>,
Arınç ÜNAL <arinc.unal@...nc9.com>,
Michael Rasmussen <MIR@...g-olufsen.dk>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH net-next 2/2] net: dsa: realtek: rtl8365mb: serialize
indirect PHY register access
Hi again Vladimir,
Alvin Šipraga <ALSI@...g-olufsen.dk> writes:
> Vladimir Oltean <olteanv@...il.com> writes:
>
>> If the problem you've identified is correct, then this simple test
>> module would yield the exact same result, yet would eliminate beyond any
>> doubt the timing and other circumstantial factors, and you could also
>> do better testing of the PHY write sequence, and MIB counter reads.
>> And if simply inserting a stray register access in the middle of the PHY
>> read procedure doesn't produce the same result, this would be new
>> information. It shouldn't even be too hard to do.
>>
> <snip>
>>
>> I have little to no problem with the workaround you've implemented, it's
>> just that extraordinary claims require extraordinary proof. Having a
>> standalone kernel module that can deterministically and not statistically
>> reproduce the bug would go a long way.
>
> Thanks Vladimir, I very much appreciate your scrutiny here. I'll make
> the test module to verify the claims I have made. In the mean time I
> asked Realtek if they have any comment.
So I made a test module which, in summary, checks the following:
1. for PHY reads, at what point does inserting a stray register access
(either read or write) cause the PHY read to fail?
2. for PHY writes, can stray register access also cause failure?
2. for MIB reads, can stray register access also cause failure?
For (1) I instrumented the PHY indirect access functions in the 6
possible places where spurious register access could occur. Of those 6
locations for spurious register access, 4 have no effect: you can put a
read or write to an unrelated register there and the PHY read will
always succeed. I tested this with spurious access to nearly every
available register on the switch.
However, for two locations of spurious register access, the PHY read
_always_ fails. The locations are marked /* XXX */ below:
/* Simplified for brevity */
static int rtl8365mb_phy_ocp_read(struct realtek_priv *priv, int phy,
u32 ocp_addr, u16 *data)
{
rtl8365mb_phy_poll_busy(priv);
rtl8365mb_phy_ocp_prepare(priv, phy, ocp_addr);
/* Execute read operation */
regmap_write(priv->map, RTL8365MB_INDIRECT_ACCESS_CTRL_REG, val);
/* XXX */
rtl8365mb_phy_poll_busy(priv);
/* XXX */
/* Get PHY register data */
regmap_read(priv->map, RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG,
&val);
*data = val & 0xFFFF;
return 0;
}
In the case of a spurious read, the result of that read then poisons the
ongoing PHY read, as suggested before. Again I verified that this is
always the case, for each available register on the switch. Spurious
writes also cause failure, and in the same locations too. I did not
investigate whether the value written is then read back as part of the
PHY read.
For (2) I did something similar to (1), but the difference here is that
I could never get PHY writes to fail. Admittedly not all bits of the PHY
registers tend to be writable, but for those bits that were writable, I
would always then read back what I had written.
For (3) I did something similar to (1), and as claimed previously, this
never resulted in a read failure. Here I had to use the MIB counters of
a disconnected port so that I could assume the values were always 0.
I have attached the test module (and header file generated from an
enormous header file from the Realtek driver sources, so that I could
iterate over every possible register). It is pretty gruesome reading but
gives me confidence in my earlier claims. The only refinements to those
claims are:
- where _exactly_ a spurious register access will cause failure: see the
/* XXX */ in the code snippet upstairs;
- PHY writes seem not to be affected at all.
Finally, I reached out to Realtek, and they confirmed pretty much the
same as above. However, they claim it is not a hardware bug, but merely
a property of the hardware design. Here I paraphrase what was said:
1. Yes, spurious register access during PHY indirect access will cause
the indirect access to fail. This is a result of the hardware design. In
general, _if a read fails, the value read back will be the result of the
last successful read_. This confirms the "register poisoning" described
earlier.
2. MIB access is a different story - this is table lookup, not indirect
access. Table lookup is not affected by spurious register access.
3. Other possible accesses - not currently present in this driver, but
for which I have some WIP changes - include ACL (Access Control List),
L2 (FDB), and MC (MDB) access. But all of these are table access similar
to MIB access, and hence not troubled by spurious register access.
4. HOWEVER, only one table can be accessed at a time. So a lock is
needed here. Currently the only table lookup is MIB access, which is
protected by mib_lock, so we are OK for now.
5. It should be sufficient to lock during indirect PHY register access
as prescribed in my patch.
I hope that clears things up. I will be sending a v2 with a revised
description, including the statements from Realtek and the results of
the tests I ran.
Kind regards,
Alvin
View attachment "regs.h" of type "text/plain" (284915 bytes)
View attachment "rtltest.c" of type "text/plain" (49654 bytes)
Powered by blists - more mailing lists