lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 13 Feb 2017 13:15:32 +0000
From:   Claudiu Manoil <claudiu.manoil@....com>
To:     Zefir Kurtisi <zefir.kurtisi@...atec.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "David S. Miller" <davem@...emloft.net>,
        Florian Fainelli <f.fainelli@...il.com>
Subject: RE: [PATCH net] at803x: insure minimum delay for SGMII link AN
 completion ckeck

>-----Original Message-----
>From: Zefir Kurtisi [mailto:zefir.kurtisi@...atec.com]
>Sent: Monday, February 13, 2017 12:16 PM
>To: Claudiu Manoil <claudiu.manoil@....com>
>Cc: netdev@...r.kernel.org; David S. Miller <davem@...emloft.net>; Florian
>Fainelli <f.fainelli@...il.com>
>Subject: Re: [PATCH net] at803x: insure minimum delay for SGMII link AN
>completion ckeck
>
>On 02/10/2017 05:42 PM, Claudiu Manoil wrote:
>> Commit: f62265b "at803x: double check SGMII side autoneg"
>> introduced a regression for the p1010rdb board which has
>> two of the ethernet controllers (eTSEC) connected through
>> SGMII links to external Atheros SGMII AR8033 PHYs.
>> The issue consists in a dead link for these ports, and is
>> 100% reproducible on kernel 4.9 (and later):
>>

[...]

>>
>
>Could you confirm that you are using PHY_HAS_INTERRUPT? In polling mode the
>effect would be very unlikely to happen, since you'd need to run the state machine
>exactly between the two AN stages.
>

Hi Zefir.  Thanks for having a look at this issue.
Yes, the phy is operating in interrupt mode.
The phy nodes from the board's device tree have their interrupt properties set:
http://lxr.free-electrons.com/source/arch/powerpc/boot/dts/fsl/p1010rdb-pb.dts

I can confirm that link status changes are signaled via interrupts ("phy_interrupt") 
in this case.  And this is always desirable, right? Why would one want to waste CPU 
cycles by polling for link status changes periodically, if it can work with interrupts.

>As for the 100us delay proposed, is this something Atheros suggested or is it
>based on empirical considerations? Since ending up in a situation where the
>double-check fails might left you with a permanent link loss, having a reliable
>minimum required delay between first and second stage AN is essential - to me
>100us look quite short.
>

The value was chosen based on experiments, so yes, it's empirical.  I don't think 
that detailed documentation for this phy is publicly available.  I was expecting a 
small delay and I was looking for the smallest power of 10 (10 us doesn't work).
But I'm also expecting that the SGMII specification is imposing a minimum delay 
between AN completion on the copper side and AN completion on the SGMII side.

>Same goes for the readout polling proposed: given that reading an MDIO register
>takes ~16us (at assumed 2.5MHz MDC), delaying for 1us between reads is kind of
>useless and ends up in storm-reading the register (and also extends the wait-time
>by the same factor). Imo, it won't hurt to sleep for milliseconds between reads
>here (see phy_poll_reset() for reference).
>

I'm not fond of using udelay either, I'm also aware that the timeout loop is not 
exactly 100 us, given the delay involved by the phy register reads, but I wanted 
to emphasize that there needs to be a minimum delay before establishing that
the SGMII AN is done.
Would you mind sending a v2 patch using msleep() instead of udelay()?
You have an idea now of the minimum delay needed in my case.

Thanks,
Claudiu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ