[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1fd492b4-3c32-7939-791c-56cf6d8c0078@gmail.com>
Date: Mon, 14 Nov 2016 13:00:13 -0800
From: Florian Fainelli <f.fainelli@...il.com>
To: Mason <slash.tmp@...e.fr>
Cc: Sebastian Frias <sf84@...oste.net>, Andrew Lunn <andrew@...n.ch>,
netdev <netdev@...r.kernel.org>, Mans Rullgard <mans@...sr.com>,
Sergei Shtylyov <sergei.shtylyov@...entembedded.com>,
Tom Lendacky <thomas.lendacky@....com>,
Zach Brown <zach.brown@...com>,
Shaohui Xie <shaohui.xie@....com>,
Tim Beale <tim.beale@...iedtelesis.co.nz>,
Brian Hill <brian@...ston-radar.com>,
Vince Bridgers <vbridgers2013@...il.com>,
Balakumaran Kannan <kumaran.4353@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Kirill Kapranov <kapranoff@...ox.ru>
Subject: Re: Debugging Ethernet issues
On 11/14/2016 12:27 PM, Mason wrote:
> On 14/11/2016 19:20, Florian Fainelli wrote:
>
>> On 11/14/2016 09:59 AM, Sebastian Frias wrote:
>>
>>> Could you confirm that Mason's patch is correct and/or that it does not
>>> has negative side-effects?
>>
>> The patch is not correct nor incorrect per-se, it changes the default
>> policy of having pause frames advertised by default to not having them
>> advertised by default. This influences both your Ethernet MAC and the
>> link partner in that the result is either flow control is enabled
>> (before) or it is not (with the patch). There must be something amiss if
>> you see packet loss or some kind of problem like that with an early
>> exchange such as DHCP. Flow control tend to kick in under higher packet
>> rates (at least, that's what you expect).
>
> Did you note that, without the change under discussion (i.e. with
> the eth driver as it is upstream), when the board is connected to
> a 100 Mbps switch, then *nothing* works *systematically (no ping,
> no DHCP; are there other relevant low-level network tools?).
No I missed that, way too many emails, really. So how about you compare
the register settings that could be (that is, all that could be modified
by the PHYLIB adjust_link function) and try to spot where things could
go wrong? Any other register that can be influenced by the link speed?
It seems like a possible (yet after re-reading, very unlikely) scenario,
considering that priv->speed, priv->duplex and priv->link are initially
zero-initialized (because nb8800_priv is zero initialized) may not force
a correct link transition and a full MAC reconfiguration in
nb8800_link_reconfigure() where some of the cached values are used.
NB: you will see most drivers initialize the previous link, speed,
duplex values to -1, because those are outside of the range of values
that PHYLIB would assign to phydev->{link,duplex,speed}, and therefore,
this is guaranteed to make the adjust_link callback that tries to
minimize these settings to force a transition.
>
> Also, maybe this comment was lost in my own noise:
>
> If I manually set the link up, then down, then run udhcpc
> => then nothing works, as if something is wedged somewhere
> (a kernel thread gets borked by a race condition?)
Well then start seriously debugging the problem: firs thing you need to
check is is the RUNNING flag set on the interface (which indicates a
carrier on?) without that, the networking stack won't even send packets.
If it is not set, why is not it set? Did nb8800_mac_config() get called
in the first place to configure the MAC wrt. the link settings?
When you transmit, do transmit counters increase? That would indicate
the TX DMA does its job. When transmission occurs, it is successful or
is it reporting errors? If the PHY supports it, can you access PHY
counters and look for success/error counters changing? Finally, try to
put another golden (working) host and if your switch supports it,
configure port mirroring to look at packets. If the switch does not
support it, then try different link partners.
>
> Could not advertising pause frames result in making such a
> race condition impossible? (I don't really believe in a race,
> due to the 100% nature of the problem.)
>
>>> Right now we know that Mason's patch makes this work, but we do not understand
>>> why nor its implications.
>>
>> You need to understand why, right now, the way this problem is
>> presented, you came up with a workaround, not with the root cause or the
>> solution. What does your link partner (switch?) reports, that is, what
>> is the ethtool output when you have a link up from your nb8800 adapter?
>
> Isn't that what ethtool -a eth0 prints?
No, ethtool -a prints the local pause settings.
> How do I get the link partner information?
ethtool eth0:
# ethtool eth0
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
^======================
Link partner advertised pause frame use: Symmetric
Link partner advertised auto-negotiation: Yes
^========================
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Supports Wake-on: gs
Wake-on: d
SecureOn password: 00:00:00:00:00:00
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
#
> Just ethtool eth0?
Yes, just that.
--
Florian
Powered by blists - more mailing lists