netdev - Re: thunderx sgmii interface hang

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7cb389a0-c035-81b5-17d5-c814e873c670@caviumnetworks.com>
Date:   Fri, 22 Dec 2017 15:00:13 -0800
From:   David Daney <ddaney@...iumnetworks.com>
To:     Tim Harvey <tharvey@...eworks.com>, Andrew Lunn <andrew@...n.ch>
Cc:     Sunil Goutham <sgoutham@...ium.com>,
        netdev <netdev@...r.kernel.org>
Subject: Re: thunderx sgmii interface hang

On 12/22/2017 02:19 PM, Tim Harvey wrote:
> On Tue, Dec 19, 2017 at 12:52 PM, Andrew Lunn <andrew@...n.ch> wrote:
>> On Mon, Dec 18, 2017 at 01:53:47PM -0800, Tim Harvey wrote:
>>> On Wed, Dec 13, 2017 at 11:43 AM, Andrew Lunn <andrew@...n.ch> wrote:
>>>>> The nic appears to work fine (pings, TCP etc) up until a performance
>>>>> test is attempted.
>>>>> When an iperf bandwidth test is attempted the nic ends up in a state
>>>>> where truncated-ip packets are being sent out (per a tcpdump from
>>>>> another board):
>>>>
>>>> Hi Tim
>>>>
>>>> Are pause frames supported? Have you tried turning them off?
>>>>
>>>> Can you reproduce the issue with UDP? Or is it TCP only?
>>>>
>>>
>>> Andrew,
>>>
>>> Pause frames don't appear to be supported yet and the issue occurs
>>> when using UDP as well as TCP. I'm not clear what the best way to
>>> troubleshoot this is.
>>
>> Hi Tim
>>
>> Is pause being negotiated? In theory, it should not be. The PHY should
>> not offer it, if the MAC has not enabled it. But some PHY drivers are
>> probably broken and offer pause when they should not.
>>
>> Also, can you trigger the issue using UDP at say 75% the maximum
>> bandwidth. That should be low enough that the peer never even tries to
>> use pause.
>>
>> All this pause stuff is just a stab in the dark. Something else to try
>> is to turn off various forms off acceleration, ethtook -K, and see if
>> that makes a difference.
>>
> 
> Andrew,
> 
> Currently I'm not using the DP83867_PHY driver (after verifying the
> issue occurs with or without that driver).
> 
> It does not occur if I limit UDP (ie 950mbps). I disabled all offloads
> and the issue still occurs.
> 
> I have found that once the issue occurs I can recover to a working
> state by clearing/setting BGX_CMRX_CFG[BGX_EN] and once I encounter
> the issue and recover with that, I can never trigger the issue again.
> If toggle that register bit upon power-up before the issue occurs it
> will still occur.
> 
> The CN80XX reference manual describes BGX_CMRX_CFG[BGX_EN] as:
> - when cleared all dedicated BGX context state for LMAC (state
> machine, FIFOs, counters etc) are reset and LMAC access to shared BGX
> resources (data path, serdes lanes) is disabled
> - when set LMAC operation is enabled (link bring-up, sync, and tx/rx
> of idles and fault sequences)

You could try looking at
BGXX_GMP_PCS_INTX
BGXX_GMP_GMI_RXX_INT
BGXX_GMP_GMI_TXX_INT

Those are all W1C registers that should contain all zeros.  If they 
don't, just write back to them to clear before running a test.

If there are bits asserting in these when the thing gets wedged up, it 
might point to a possible cause.

You could also look at these RO registers:
BGXX_GMP_PCS_TXX_STATES
BGXX_GMP_PCS_RXX_STATES




> 
> I'm told that the particular Cavium reference board with an SGMII phy
> doesn't show this issue (I don't have that specific board to do my own
> testing or comparisons against our board) so I'm inclined to think it
> has something to do with an interaction with the DP83867 PHY. I would
> like to start poking at PHY registers to see if I can find anything
> unusual. The best way to do that from userspace is via
> SIOCGMIIREG/SIOCSMIIREG right? The thunderx nic doesn't currently
> support ioctl's so I guess I'll have to add that support unless
> there's a way to get at phy registers from userspace through a phy
> driver?
> 
> Regards,
> 
> Tim
>