netdev - Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d5cd081c-9882-fce8-697c-2be062756962@mellanox.com>
Date:   Mon, 26 Jun 2017 14:52:39 +0300
From:   Gal Pressman <galp@...lanox.com>
To:     Andrew Lunn <andrew@...n.ch>
Cc:     netdev@...r.kernel.org, "David S. Miller" <davem@...emloft.net>,
        "John W. Linville" <linville@...driver.com>,
        Saeed Mahameed <saeedm@...lanox.com>,
        Vidya Sagar Ravipati <vidya@...ulusnetworks.com>,
        Jiri Pirko <jiri@...lanox.com>,
        David Decotigny <decot@...glers.com>, kernel-team@...com
Subject: Re: [RFC PATCH net-next 1/3] ethtool: Add link down reason callback


> On Sun, Jun 25, 2017 at 02:59:24PM +0300, Gal Pressman wrote:
>>> On Thu, Jun 22, 2017 at 11:09:04AM +0300, Gal Pressman wrote:
>>>>>> +enum {
>>>>>> +	ETHTOOL_LINK_VENDOR_SPECIFIC = -1, /* Vendor specific issue provided in vendor_reason */
>>>>>> +	ETHTOOL_LINK_NO_ISSUE, /* No issue observed with link */
>>>>>> +	ETHTOOL_LINK_REASON_UNKNOWN, /* Unknown reason */
>>>>> I think OTHER would be better that UNKNOWN. 
>>>> Fine with me.
>>>>>> +	ETHTOOL_LINK_NETDEV_CARRIER_DOWN, /* Netdev carrier is down */
>>>>>> +	ETHTOOL_LINK_ADMIN_DOWN, /* Admin down */
>>>>> These two are interesting. We have that information already. Why do we
>>>>> want it again?
>>>> My goal is to gather all link issue reasons in one place.
>>> I'm actually wondering if this is a user space problem. Nearly
>>> everything you list is already available. Some you get from ip link,
>>> others from ethtool or ethtool --module-info, including I2C bus
>>> error, since you would expect EIO or ETIMEOUT.
>>>
>>> If you were to write a user space tool using the information what is
>>> currently available, what would be missing?
>>>
>>> 	  Andrew
>> I think most of the reasons in this list would be missing.
>> Auto negotiation failure,
> You can probably get that from the PHY layer. You get both the local
> and remote AN capabilities.
>
>> unplugged, over temperature, power budget exceeded..
> Don't you get over temperature from the SFF data? Also power budget?
You are right, but it depends on other resources that might fail such as BUS failure, invalid EEPROM, etc..

> And what does cable unplugged actually mean? Do you have a micro
> switch inside the socket? So that is maybe a gpio-key?
No, some hardware devices can sense this state.
We would like to expose this information when it's available.

> Another thing to remember is that your device is the exception to the
> rule. You have some firmware doing a lot of the work bringing this all
> together. But nearly every other Ethernet interface has a discrete MAC
> and PHY, I2C bus driver, EEPROM driver, generic SFF decoder, HWMON
> temperature sensor, etc. How does your call work in this normal
> situation? How do you make calls into all these subsystems to get the
> information? You want a generic solution which can be made to work for
> everybody.
The driver has a good intimate information of his device implementation, and hence an analysis done by the device vendor is favorable.
The driver provider can perform the analysis inside the device (firmware) or in the driver according to his preferences.
We believe that since devices are becoming smarter, more analysis will be done by the device itself, which has more
information and faster access.
Smart NICs/SoCs are very popular this days and this API takes into account the different architectures.

Since this callback is optional, a user space analysis tool can be added in the future providing more generic analysis approach for
unsupported devices.

> 	Andrew
>
>> Keep in mind that this is just an initial list, not to mention the vendor reasons which are not part of any existing API.
>> I don't see how a user space tool that expects ETIMEOUT/EIO is better than this suggestion.