lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 22 Dec 2017 14:19:22 -0800
From:   Tim Harvey <tharvey@...eworks.com>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     Sunil Goutham <sgoutham@...ium.com>,
        netdev <netdev@...r.kernel.org>
Subject: Re: thunderx sgmii interface hang

On Tue, Dec 19, 2017 at 12:52 PM, Andrew Lunn <andrew@...n.ch> wrote:
> On Mon, Dec 18, 2017 at 01:53:47PM -0800, Tim Harvey wrote:
>> On Wed, Dec 13, 2017 at 11:43 AM, Andrew Lunn <andrew@...n.ch> wrote:
>> >> The nic appears to work fine (pings, TCP etc) up until a performance
>> >> test is attempted.
>> >> When an iperf bandwidth test is attempted the nic ends up in a state
>> >> where truncated-ip packets are being sent out (per a tcpdump from
>> >> another board):
>> >
>> > Hi Tim
>> >
>> > Are pause frames supported? Have you tried turning them off?
>> >
>> > Can you reproduce the issue with UDP? Or is it TCP only?
>> >
>>
>> Andrew,
>>
>> Pause frames don't appear to be supported yet and the issue occurs
>> when using UDP as well as TCP. I'm not clear what the best way to
>> troubleshoot this is.
>
> Hi Tim
>
> Is pause being negotiated? In theory, it should not be. The PHY should
> not offer it, if the MAC has not enabled it. But some PHY drivers are
> probably broken and offer pause when they should not.
>
> Also, can you trigger the issue using UDP at say 75% the maximum
> bandwidth. That should be low enough that the peer never even tries to
> use pause.
>
> All this pause stuff is just a stab in the dark. Something else to try
> is to turn off various forms off acceleration, ethtook -K, and see if
> that makes a difference.
>

Andrew,

Currently I'm not using the DP83867_PHY driver (after verifying the
issue occurs with or without that driver).

It does not occur if I limit UDP (ie 950mbps). I disabled all offloads
and the issue still occurs.

I have found that once the issue occurs I can recover to a working
state by clearing/setting BGX_CMRX_CFG[BGX_EN] and once I encounter
the issue and recover with that, I can never trigger the issue again.
If toggle that register bit upon power-up before the issue occurs it
will still occur.

The CN80XX reference manual describes BGX_CMRX_CFG[BGX_EN] as:
- when cleared all dedicated BGX context state for LMAC (state
machine, FIFOs, counters etc) are reset and LMAC access to shared BGX
resources (data path, serdes lanes) is disabled
- when set LMAC operation is enabled (link bring-up, sync, and tx/rx
of idles and fault sequences)

I'm told that the particular Cavium reference board with an SGMII phy
doesn't show this issue (I don't have that specific board to do my own
testing or comparisons against our board) so I'm inclined to think it
has something to do with an interaction with the DP83867 PHY. I would
like to start poking at PHY registers to see if I can find anything
unusual. The best way to do that from userspace is via
SIOCGMIIREG/SIOCSMIIREG right? The thunderx nic doesn't currently
support ioctl's so I guess I'll have to add that support unless
there's a way to get at phy registers from userspace through a phy
driver?

Regards,

Tim

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ