lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 14 Nov 2016 13:00:13 -0800
From:   Florian Fainelli <f.fainelli@...il.com>
To:     Mason <slash.tmp@...e.fr>
Cc:     Sebastian Frias <sf84@...oste.net>, Andrew Lunn <andrew@...n.ch>,
        netdev <netdev@...r.kernel.org>, Mans Rullgard <mans@...sr.com>,
        Sergei Shtylyov <sergei.shtylyov@...entembedded.com>,
        Tom Lendacky <thomas.lendacky@....com>,
        Zach Brown <zach.brown@...com>,
        Shaohui Xie <shaohui.xie@....com>,
        Tim Beale <tim.beale@...iedtelesis.co.nz>,
        Brian Hill <brian@...ston-radar.com>,
        Vince Bridgers <vbridgers2013@...il.com>,
        Balakumaran Kannan <kumaran.4353@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Kirill Kapranov <kapranoff@...ox.ru>
Subject: Re: Debugging Ethernet issues

On 11/14/2016 12:27 PM, Mason wrote:
> On 14/11/2016 19:20, Florian Fainelli wrote:
> 
>> On 11/14/2016 09:59 AM, Sebastian Frias wrote:
>>
>>> Could you confirm that Mason's patch is correct and/or that it does not
>>> has negative side-effects?
>>
>> The patch is not correct nor incorrect per-se, it changes the default
>> policy of having pause frames advertised by default to not having them
>> advertised by default. This influences both your Ethernet MAC and the
>> link partner in that the result is either flow control is enabled
>> (before) or it is not (with the patch). There must be something amiss if
>> you see packet loss or some kind of problem like that with an early
>> exchange such as DHCP. Flow control tend to kick in under higher packet
>> rates (at least, that's what you expect).
> 
> Did you note that, without the change under discussion (i.e. with
> the eth driver as it is upstream), when the board is connected to
> a 100 Mbps switch, then *nothing* works *systematically (no ping,
> no DHCP; are there other relevant low-level network tools?).

No I missed that, way too many emails, really. So how about you compare
the register settings that could be (that is, all that could be modified
by the PHYLIB adjust_link function) and try to spot where things could
go wrong? Any other register that can be influenced by the link speed?

It seems like a possible (yet after re-reading, very unlikely) scenario,
considering that priv->speed, priv->duplex and priv->link are initially
zero-initialized (because nb8800_priv is zero initialized) may not force
a correct link transition and a full MAC reconfiguration in
nb8800_link_reconfigure() where some of the cached values are used.

NB: you will see most drivers initialize the previous link, speed,
duplex values to -1, because those are outside of the range of values
that PHYLIB would assign to phydev->{link,duplex,speed}, and therefore,
this is guaranteed to make the adjust_link callback that tries to
minimize these settings to force a transition.

> 
> Also, maybe this comment was lost in my own noise:
> 
> If I manually set the link up, then down, then run udhcpc
> => then nothing works, as if something is wedged somewhere
> (a kernel thread gets borked by a race condition?)

Well then start seriously debugging the problem: firs thing you need to
check is is the RUNNING flag set on the interface (which indicates a
carrier on?) without that, the networking stack won't even send packets.
If it is not set, why is not it set? Did nb8800_mac_config() get called
in the first place to configure the MAC wrt. the link settings?

When you transmit, do transmit counters increase? That would indicate
the TX DMA does its job. When transmission occurs, it is successful or
is it reporting errors? If the PHY supports it, can you access PHY
counters and look for success/error counters changing? Finally, try to
put another golden (working) host and if your switch supports it,
configure port mirroring to look at packets. If the switch does not
support it, then try different link partners.

> 
> Could not advertising pause frames result in making such a
> race condition impossible? (I don't really believe in a race,
> due to the 100% nature of the problem.)
> 
>>> Right now we know that Mason's patch makes this work, but we do not understand
>>> why nor its implications.
>>
>> You need to understand why, right now, the way this problem is
>> presented, you came up with a workaround, not with the root cause or the
>> solution. What does your link partner (switch?) reports, that is, what
>> is the ethtool output when you have a link up from your nb8800 adapter?
> 
> Isn't that what ethtool -a eth0 prints?

No, ethtool -a prints the local pause settings.

> How do I get the link partner information?

ethtool eth0:

# ethtool eth0
Settings for eth0:
        Supported ports: [ TP MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full

	^======================

        Link partner advertised pause frame use: Symmetric
        Link partner advertised auto-negotiation: Yes

	^========================

        Speed: 100Mb/s
        Duplex: Full
        Port: MII
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: gs
        Wake-on: d
        SecureOn password: 00:00:00:00:00:00
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes
#


> Just ethtool eth0?

Yes, just that.
-- 
Florian

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ