lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <20241004121824.1716303-1-o.rempel@pengutronix.de>
Date: Fri,  4 Oct 2024 14:18:24 +0200
From: Oleksij Rempel <o.rempel@...gutronix.de>
To: Andrew Lunn <andrew@...n.ch>,
	Heiner Kallweit <hkallweit1@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>,
	Paolo Abeni <pabeni@...hat.com>,
	Rob Herring <robh@...nel.org>,
	Krzysztof Kozlowski <krzk+dt@...nel.org>,
	Florian Fainelli <f.fainelli@...il.com>,
	Maxime Chevallier <maxime.chevallier@...tlin.com>,
	Kory Maincent <kory.maincent@...tlin.com>,
	Lukasz Majewski <lukma@...x.de>,
	Jonathan Corbet <corbet@....net>
Cc: Oleksij Rempel <o.rempel@...gutronix.de>,
	kernel@...gutronix.de,
	linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org,
	Russell King <linux@...linux.org.uk>,
	Divya.Koppera@...rochip.com
Subject: [PATCH net-next v3 1/1] Documentation: networking: add Twisted Pair Ethernet diagnostics at OSI Layer 1

This patch introduces a diagnostic guide for troubleshooting Twisted
Pair  Ethernet variants at OSI Layer 1. It provides detailed steps for
detecting  and resolving common link issues, such as incorrect wiring,
cable damage,  and power delivery problems. The guide also includes
interface verification  steps and PHY-specific diagnostics.

Signed-off-by: Oleksij Rempel <o.rempel@...gutronix.de>
---
changes v3:
- remove all additional technical information.
changes v2:
- add link to the networking/index.rst
---
 Documentation/networking/diagnostic/index.rst |  17 +
 .../twisted_pair_layer1_diagnostics.rst       | 767 ++++++++++++++++++
 Documentation/networking/index.rst            |   1 +
 3 files changed, 785 insertions(+)
 create mode 100644 Documentation/networking/diagnostic/index.rst
 create mode 100644 Documentation/networking/diagnostic/twisted_pair_layer1_diagnostics.rst

diff --git a/Documentation/networking/diagnostic/index.rst b/Documentation/networking/diagnostic/index.rst
new file mode 100644
index 0000000000000..86488aa46b484
--- /dev/null
+++ b/Documentation/networking/diagnostic/index.rst
@@ -0,0 +1,17 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Networking Diagnostics
+======================
+
+.. toctree::
+   :maxdepth: 2
+
+   twisted_pair_layer1_diagnostics.rst
+
+.. only::  subproject and html
+
+   Indices
+   =======
+
+   * :ref:`genindex`
diff --git a/Documentation/networking/diagnostic/twisted_pair_layer1_diagnostics.rst b/Documentation/networking/diagnostic/twisted_pair_layer1_diagnostics.rst
new file mode 100644
index 0000000000000..c9be5cc7e1133
--- /dev/null
+++ b/Documentation/networking/diagnostic/twisted_pair_layer1_diagnostics.rst
@@ -0,0 +1,767 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+Diagnostic Concept for Investigating Twisted Pair Ethernet Variants at OSI Layer 1
+==================================================================================
+
+Introduction
+------------
+
+This documentation is designed for two primary audiences:
+
+1. **Users and System Administrators**: For those dealing with real-world
+   Ethernet issues, this guide provides a practical, step-by-step
+   troubleshooting flow to help identify and resolve common problems in Twisted
+   Pair Ethernet at OSI Layer 1. If you're facing unstable links, speed drops,
+   or mysterious network issues, jump right into the step-by-step guide and
+   follow it through to find your solution.
+
+2. **Kernel Developers**: For developers working with network drivers and PHY
+   support, this documentation outlines the diagnostic process and highlights
+   areas where the Linux kernel’s diagnostic interfaces could be extended or
+   improved. By understanding the diagnostic flow, developers can better
+   prioritize future enhancements.
+
+Step-by-Step Diagnostic Guide from Linux (General Ethernet)
+-----------------------------------------------------------
+
+This diagnostic guide covers common Ethernet troubleshooting scenarios,
+focusing on **link stability and detection** across different Ethernet
+environments, including **Single-Pair Ethernet (SPE)** and **Multi-Pair
+Ethernet (MPE)**, as well as power delivery technologies like **PoDL** (Power
+over Data Line) and **PoE** (Clause 33 PSE).
+
+The guide is designed to help users diagnose physical layer (Layer 1) issues on
+systems running **Linux kernel version 6.11 or newer**, utilizing **ethtool
+version 6.10 or later** and **iproute2 version 6.4.0 or later**.
+
+In this guide, we assume that users may have **limited or no access to the link
+partner** and will focus on diagnosing issues locally.
+
+Diagnostic Scenarios
+~~~~~~~~~~~~~~~~~~~~
+
+- **Link is up and stable, but no data transfer**: If the link is stable but
+  there are issues with data transmission, refer to the **OSI Layer 2
+  Troubleshooting Guide**.
+
+- **Link is unstable**: Link resets, speed drops, or other fluctuations
+  indicate potential issues at the hardware or physical layer.
+
+- **No link detected**: The interface is up, but no link is established.
+
+Verify Interface Status
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Begin by verifying the status of the Ethernet interface to check if it is
+administratively up. Unlike `ethtool`, which provides information on the link
+and PHY status, it does not show the **administrative state** of the interface.
+To check this, you should use the `ip` command, which describes the interface
+state within the angle brackets `"<>"` in its output.
+
+For example, in the output `<NO-CARRIER,BROADCAST,MULTICAST,UP>`, the important
+keywords are:
+
+- **UP**: The interface is in the administrative "UP" state.
+- **NO-CARRIER**: The interface is administratively up, but no physical link is
+  detected.
+
+If the output shows `<BROADCAST,MULTICAST>`, this indicates the interface is in
+the administrative "DOWN" state.
+
+- **Command:** `ip link show dev <interface>`
+
+- **Expected Output:**
+
+  .. code-block:: bash
+
+     4: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 ...
+        link/ether 88:14:2b:00:96:f2 brd ff:ff:ff:ff:ff:ff
+
+- **Interpreting the Output:**
+
+  - **Administrative UP State**:
+
+    - If the output contains **"UP"**, the interface is administratively up,
+      and the system is trying to establish a physical link.
+
+    - If you also see **"NO-CARRIER"**, it means the physical link has not been
+      detected, indicating potential Layer 1 issues like a cable fault,
+      misconfiguration, or no connection at the link partner. In this case,
+      proceed to the **Inspect Link Status and PHY Configuration** section.
+
+  - **Administrative DOWN State**:
+
+    - If the output lacks **"UP"** and shows only states like
+      **"<BROADCAST,MULTICAST>"**, it means the interface is administratively
+      down. In this case, bring the interface up using the following command:
+
+      .. code-block:: bash
+
+         ip link set dev <interface> up
+
+- **Next Steps**:
+
+  - If the interface is **administratively up** but shows **NO-CARRIER**,
+    proceed to the **Inspect Link Status and PHY Configuration** section to
+    troubleshoot potential physical layer issues.
+
+  - If the interface was **administratively down** and you have brought it up,
+    ensure to **repeat this verification step** to confirm the new state of the
+    interface before proceeding
+
+  - **If the interface is up and the link is detected**:
+
+    - If the output shows **"UP"** and there is **no `NO-CARRIER`**, the
+      interface is administratively up, and the physical link has been
+      successfully established. If everything is working as expected, the Layer
+      1 diagnostics are complete, and no further action is needed.
+
+    - If the interface is up and the link is detected but **no data is being
+      transferred**, the issue is likely beyond Layer 1, and you should proceed
+      with diagnosing the higher layers of the OSI model. This may involve
+      checking Layer 2 configurations (such as VLANs or MAC address issues),
+      Layer 3 settings (like IP addresses, routing, or ARP), or Layer 4 and
+      above (firewalls, services, etc.).
+
+    - If the **link is unstable** or **frequently resetting or dropping**, this
+      may indicate a physical layer issue such as a faulty cable, interference,
+      or power delivery problems. In this case, proceed with the next step in
+      this guide.
+
+Inspect Link Status and PHY Configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use `ethtool -I` to check the link status, PHY configuration, supported link
+modes, and additional statistics such as the **Link Down Events** counter. This
+step is essential for diagnosing Layer 1 problems such as speed mismatches,
+duplex issues, and link instability.
+
+For both **Single-Pair Ethernet (SPE)** and **Multi-Pair Ethernet (MPE)**
+devices, you will use this step to gather key details about the link. **SPE**
+links generally support a single speed and mode without autonegotiation (with
+the exception of **10BaseT1L**), while **MPE** devices typically support
+multiple link modes and autonegotiation.
+
+- **Command:** `ethtool -I <interface>`
+
+- **Example Output for SPE Interface (Non-autonegotiation)**:
+
+  .. code-block:: bash
+
+     Settings for spe4:
+         Supported ports: [ TP ]
+         Supported link modes:   100baseT1/Full
+         Supported pause frame use: No
+         Supports auto-negotiation: No
+         Supported FEC modes: Not reported
+         Advertised link modes: Not applicable
+         Advertised pause frame use: No
+         Advertised auto-negotiation: No
+         Advertised FEC modes: Not reported
+         Speed: 100Mb/s
+         Duplex: Full
+         Auto-negotiation: off
+         master-slave cfg: forced slave
+         master-slave status: slave
+         Port: Twisted Pair
+         PHYAD: 6
+         Transceiver: external
+         MDI-X: Unknown
+         Supports Wake-on: d
+         Wake-on: d
+         Link detected: yes
+         SQI: 7/7
+         Link Down Events: 2
+
+- **Example Output for MPE Interface (Autonegotiation)**:
+
+  .. code-block:: bash
+
+     Settings for eth1:
+         Supported ports: [ TP    MII ]
+         Supported link modes:   10baseT/Half 10baseT/Full
+                                 100baseT/Half 100baseT/Full
+         Supported pause frame use: Symmetric Receive-only
+         Supports auto-negotiation: Yes
+         Supported FEC modes: Not reported
+         Advertised link modes:  10baseT/Half 10baseT/Full
+                                 100baseT/Half 100baseT/Full
+         Advertised pause frame use: Symmetric Receive-only
+         Advertised auto-negotiation: Yes
+         Advertised FEC modes: Not reported
+         Link partner advertised link modes:  10baseT/Half 10baseT/Full
+                                              100baseT/Half 100baseT/Full
+         Link partner advertised pause frame use: Symmetric Receive-only
+         Link partner advertised auto-negotiation: Yes
+         Link partner advertised FEC modes: Not reported
+         Speed: 100Mb/s
+         Duplex: Full
+         Auto-negotiation: on
+         Port: Twisted Pair
+         PHYAD: 10
+         Transceiver: internal
+         MDI-X: Unknown
+         Supports Wake-on: pg
+         Wake-on: p
+         Link detected: yes
+         Link Down Events: 1
+
+- **Next Steps**:
+
+  - Record the output provided by `ethtool`, particularly noting the
+    **master-slave status**, **speed**, **duplex**, and other relevant fields.
+    This information will be useful for further analysis or troubleshooting.
+    Once the **ethtool** output has been collected and stored, move on to the
+    next diagnostic step.
+
+Check Power Delivery (PoDL or PoE)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If it is known that **PoDL** or **PoE** is **not implemented** on the system,
+or the **PSE** (Power Sourcing Equipment) is managed by proprietary user-space
+software or external tools, you can skip this step. In such cases, verify power
+delivery through alternative methods, such as checking hardware indicators
+(LEDs), using multimeters, or consulting vendor-specific software for
+monitoring power status.
+
+If **PoDL** or **PoE** is implemented and managed directly by Linux, follow
+these steps to ensure power is being delivered correctly:
+
+- **Command:** `ethtool --show-pse <interface>`
+
+- **Expected Output Examples**:
+
+  1. **PSE Not Supported**:
+
+     If no PSE is attached or the interface does not support PSE, the following
+     output is expected:
+
+     .. code-block:: bash
+
+        netlink error: No PSE is attached
+        netlink error: Operation not supported
+
+  2. **PoDL (Single-Pair Ethernet)**:
+
+     When PoDL is implemented, you might see the following attributes:
+
+     .. code-block:: bash
+
+        PSE attributes for eth1:
+        PoDL PSE Admin State: enabled
+        PoDL PSE Power Detection Status: delivering power
+
+  3. **PoE (Clause 33 PSE)**:
+
+     For standard PoE, the output may look like this:
+
+     .. code-block:: bash
+
+        PSE attributes for eth1:
+        Clause 33 PSE Admin State: enabled
+        Clause 33 PSE Power Detection Status: delivering power
+        Clause 33 PSE Available Power Limit: 18000
+
+- **Adjust Power Limit (if needed)**:
+
+  - Sometimes, the available power limit may not be sufficient for the link
+    partner. You can increase the power limit as needed.
+
+  - **Command:** `ethtool --set-pse <interface> c33-pse-avail-pw-limit <limit>`
+
+    Example:
+
+    .. code-block:: bash
+
+      ethtool --set-pse eth1 c33-pse-avail-pw-limit 18000
+      ethtool --show-pse eth1
+
+    **Expected Output** after adjusting the power limit:
+
+    .. code-block:: bash
+
+      Clause 33 PSE Available Power Limit: 18000
+
+
+- **Next Steps**:
+
+  - **PoE or PoDL Not Used**: If **PoE** or **PoDL** is not implemented or used
+    on the system, proceed to the next diagnostic step, as power delivery is
+    not relevant for this setup.
+
+  - **PoE or PoDL Controlled Externally**: If **PoE** or **PoDL** is used but
+    is not managed by the Linux kernel's **PSE-PD** framework (i.e., it is
+    controlled by proprietary user-space software or external tools), this part
+    is out of scope for this documentation. Please consult vendor-specific
+    documentation or external tools for monitoring and managing power delivery.
+
+  - **PSE Admin State Disabled**:
+
+    - If the `PSE Admin State:` is **disabled**, enable it by running one of
+      the following commands:
+
+      .. code-block:: bash
+
+         ethtool --set-pse <devname> podl-pse-admin-control enable
+
+      or, for Clause 33 PSE (PoE):
+
+         ethtool --set-pse <devname> c33-pse-admin-control enable
+
+    - After enabling the PSE Admin State, return to the start of the **Check
+      Power Delivery (PoDL or PoE)** step to recheck the power delivery status.
+
+  - **Power Not Delivered**: If the `Power Detection Status` shows something
+    other than "delivering power" (e.g., `over current`), troubleshoot the
+    **PSE**. Check for potential issues such as a short circuit in the cable,
+    insufficient power delivery, or a fault in the PSE itself.
+
+  - **Power Delivered but No Link**: If power is being delivered but no link is
+    established, proceed with further diagnostics by performing **Cable
+    Diagnostics** or reviewing the **Inspect Link Status and PHY
+    Configuration** steps to identify any underlying issues with the physical
+    link or settings.
+
+Cable Diagnostics
+~~~~~~~~~~~~~~~~~
+
+Use `ethtool` to test for physical layer issues such as cable faults. The test
+results can vary depending on the cable's condition, the technology in use, and
+the state of the link partner. The results from the cable test will help in
+diagnosing issues like open circuits, shorts, impedance mismatches, and
+noise-related problems.
+
+- **Command:** `ethtool --cable-test <interface>`
+
+The following are the typical outputs for **Single-Pair Ethernet (SPE)** and
+**Multi-Pair Ethernet (MPE)**:
+
+- **For Single-Pair Ethernet (SPE)**:
+  - **Expected Output (SPE)**:
+
+  .. code-block:: bash
+
+    Cable test completed for device eth1.
+    Pair A, fault length: 25.00m
+    Pair A code Open Circuit
+
+  This indicates an open circuit or cable fault at the reported distance, but
+  results can be influenced by the link partner's state. Refer to the
+  **"Troubleshooting Based on Cable Test Results"** section for further
+  interpretation of these results.
+
+- **For Multi-Pair Ethernet (MPE)**:
+  - **Expected Output (MPE)**:
+
+  .. code-block:: bash
+
+    Cable test completed for device eth0.
+    Pair A code OK
+    Pair B code OK
+    Pair C code Open Circuit
+
+  Here, Pair C is reported as having an open circuit, while Pairs A and B are
+  functioning correctly. However, if autonegotiation is in use on Pairs A and
+  B, the cable test may be disrupted. Refer to the **"Troubleshooting Based on
+  Cable Test Results"** section for a detailed explanation of these issues and
+  how to resolve them.
+
+For detailed descriptions of the different possible cable test results, please
+refer to the **"Troubleshooting Based on Cable Test Results"** section.
+
+Troubleshooting Based on Cable Test Results
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+After running the cable test, the results can help identify specific issues in
+the physical connection. However, it is important to note that **cable testing
+results heavily depend on the capabilities and characteristics of both the
+local hardware and the link partner**. The accuracy and reliability of the
+results can vary significantly between different hardware implementations.
+
+In some cases, this can introduce **blind spots** in the current cable testing
+implementation, where certain results may not accurately reflect the actual
+physical state of the cable. For example:
+
+- An **Open Circuit** result might not only indicate a damaged or disconnected
+  cable but also occur if the cable is properly attached to a powered-down link
+  partner.
+
+- Some PHYs may report a **Short within Pair** if the link partner is in
+  **forced slave mode**, even though there is no actual short in the cable.
+
+To help users interpret the results more effectively, it could be beneficial to
+extend the **kernel UAPI** (User API) to provide additional context or
+**possible variants** of issues based on the hardware’s characteristics. Since
+these quirks are often hardware-specific, the **kernel driver** would be an
+ideal source of such information. By providing flags or hints related to
+potential false positives for each test result, users would have a better
+understanding of what to verify and where to investigate further.
+
+Until such improvements are made, users should be aware of these limitations
+and manually verify cable issues as needed. Physical inspections may help
+resolve uncertainties related to false positive results.
+
+The results can be one of the following:
+
+- **OK**:
+
+  - The cable is functioning correctly, and no issues were detected.
+
+  - **Next Steps**: If you are still experiencing issues, it might be related
+    to higher-layer problems, such as duplex mismatches or speed negotiation,
+    which are not physical-layer issues.
+
+  - **Special Case for `BaseT1` (1000/100/10BaseT1)**: In `BaseT1` systems, an
+    "OK" result typically also means that the link is up and likely in **slave
+    mode**, since cable tests usually only pass in this mode. For some
+    **10BaseT1L** PHYs, an "OK" result may occur even if the cable is too long
+    for the PHY's configured range (for example, when the range is configured
+    for short-distance mode).
+
+- **Open Circuit**:
+
+  - An **Open Circuit** result typically indicates that the cable is damaged or
+    disconnected at the reported fault length. Consider these possibilities:
+
+    - If the link partner is in **admin down** state or powered off, you might
+      still get an "Open Circuit" result even if the cable is functional.
+
+    - **Next Steps**: Inspect the cable at the fault length for visible damage
+      or loose connections. Verify the link partner is powered on and in the
+      correct mode.
+
+- **Short within Pair**:
+
+  - A **Short within Pair** indicates an unintended connection within the same
+    pair of wires, typically caused by physical damage to the cable.
+
+    - **Next Steps**: Replace or repair the cable and check for any physical
+      damage or improperly crimped connectors.
+
+- **Short to Another Pair**:
+
+  - A **Short to Another Pair** means the wires from different pairs are
+    shorted, which could occur due to physical damage or incorrect wiring.
+
+    - **Next Steps**: Replace or repair the damaged cable. Inspect the cable for
+      incorrect terminations or pinched wiring.
+
+- **Impedance Mismatch**:
+
+  - **Impedance Mismatch** indicates a reflection caused by an impedance
+    discontinuity in the cable. This can happen when a part of the cable has
+    abnormal impedance (e.g., when different cable types are spliced together
+    or when there is a defect in the cable).
+
+    - **Next Steps**: Check the cable quality and ensure consistent impedance
+      throughout its length. Replace any sections of the cable that do not meet
+      specifications.
+
+- **Noise**:
+
+  - **Noise** means that the Time Domain Reflectometry (TDR) test could not
+    complete due to excessive noise on the cable, which can be caused by
+    interference from electromagnetic sources.
+
+    - **Next Steps**: Identify and eliminate sources of electromagnetic
+      interference (EMI) near the cable. Consider using shielded cables or
+      rerouting the cable away from noise sources.
+
+- **Resolution Not Possible**:
+
+  - **Resolution Not Possible** means that the TDR test could not detect the
+    issue due to the resolution limitations of the test or because the fault is
+    beyond the distance that the test can measure.
+
+    - **Next Steps**: Inspect the cable manually if possible, or use alternative
+      diagnostic tools that can handle greater distances or higher resolution.
+
+- **Unknown**:
+
+  - An **Unknown** result may occur when the test cannot classify the fault or
+    when a specific issue is outside the scope of the tool's detection
+    capabilities.
+
+    - **Next Steps**: Re-run the test, verify the link partner's state, and inspect
+      the cable manually if necessary.
+
+Verify Link Partner PHY Configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If the cable test passes but the link is still not functioning correctly, it’s
+essential to verify the configuration of the link partner’s PHY. Mismatches in
+speed, duplex settings, or master-slave roles can cause connection issues.
+
+Autonegotiation Mismatch
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+- If both link partners support autonegotiation, ensure that autonegotiation is
+  enabled on both sides and that all supported link modes are advertised. A
+  mismatch can lead to connectivity problems or sub optimal performance.
+
+- **Quick Fix:** Reset autonegotiation to the default settings, which will
+  advertise all default link modes:
+
+  .. code-block:: bash
+
+     ethtool -s <interface> autoneg on
+
+- **Command to check configuration:** `ethtool <interface>`
+
+- **Expected Output:** Ensure that both sides advertise compatible link modes.
+  If autonegotiation is off, verify that both link partners are configured for
+  the same speed and duplex.
+
+  The following example shows a case where the local PHY advertises fewer link
+  modes than it supports. This will reduce the number of overlapping link modes
+  with the link partner. In the worst case, there will be no common link modes,
+  and the link will not be created:
+
+  .. code-block:: bash
+
+     Settings for eth0:
+        Supported link modes:  1000baseT/Full, 100baseT/Full
+        Advertised link modes: 1000baseT/Full
+        Speed: 1000Mb/s
+        Duplex: Full
+        Auto-negotiation: on
+
+Combined Mode Mismatch (Autonegotiation on One Side, Forced on the Other)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- One possible issue occurs when one side is using **autonegotiation** (as in
+  most modern systems), and the other side is set to a **forced link mode**
+  (e.g., older hardware with single-speed hubs). In such cases, modern PHYs
+  will attempt to detect the forced mode on the other side. If the link is
+  established, you may notice:
+
+  - **No or empty "Link partner advertised link modes"**.
+
+  - **"Link partner advertised auto-negotiation:"** will be **"no"** or not
+    present.
+
+- This type of detection does not always work reliably:
+
+  - Typically, the modern PHY will default to **Half Duplex**, even if the link
+    partner is actually configured for **Full Duplex**.
+
+  - Some PHYs may not work reliably if the link partner switches from one
+    forced mode to another. In this case, only a down/up cycle may help.
+
+- **Next Steps**: Set both sides to the same fixed speed and duplex mode to
+  avoid potential detection issues.
+
+  .. code-block:: bash
+
+     ethtool -s <interface> speed 1000 duplex full autoneg off
+
+Master/Slave Role Mismatch (BaseT1 and 1000BaseT PHYs)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- In **BaseT1** systems (e.g., 1000BaseT1, 100BaseT1), link establishment
+  requires that one device is configured as **master** and the other as
+  **slave**. A mismatch in this master-slave configuration can prevent the link
+  from being established. However, **1000BaseT** also supports configurable
+  master/slave roles and can face similar issues.
+
+- **Role Preference in 1000BaseT**: The **1000BaseT** specification allows link
+  partners to negotiate master-slave roles or role preferences during
+  autonegotiation. Some PHYs have hardware limitations or bugs that prevent
+  them from functioning properly in certain roles. In such cases, drivers may
+  force these PHYs into a specific role (e.g., **forced master** or **forced
+  slave**) or try a weaker option by setting preferences. If both link partners
+  have the same issue and are forced into the same mode (e.g., both forced into
+  master mode), they will not be able to establish a link.
+
+- **Next Steps**: Ensure that one side is configured as **master** and the
+  other as **slave** to avoid this issue, particularly when hardware
+  limitations are involved, or try the weaker **preferred** option instead of
+  **forced**. Check for any driver-related restrictions or forced modes.
+
+- **Command to force master/slave mode**:
+
+  .. code-block:: bash
+
+     ethtool -s <interface> master-slave forced-master
+
+  or:
+
+  .. code-block:: bash
+
+     ethtool -s <interface> master-slave forced-master speed 1000 duplex full autoneg off
+
+
+- **Check the current master/slave status**:
+
+  .. code-block:: bash
+
+     ethtool <interface>
+
+  Example Output:
+
+  .. code-block:: bash
+
+     master-slave cfg: forced-master
+     master-slave status: master
+
+- **Hardware Bugs and Driver Forcing**: If a known hardware issue forces the
+  PHY into a specific mode, it’s essential to check the driver source code or
+  hardware documentation for details. Ensure that the roles are compatible
+  across both link partners, and if both PHYs are forced into the same mode,
+  adjust one side accordingly to resolve the mismatch.
+
+Monitor Link Resets and Speed Drops
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If the link is unstable, showing frequent resets or speed drops, this may
+indicate issues with the cable, PHY configuration, or environmental factors.
+While there is still no completely unified way in Linux to directly monitor
+downshift events or link speed changes via user space tools, both the Linux
+kernel logs and `ethtool` can provide valuable insights, especially if the
+driver supports reporting such events.
+
+- **Monitor Kernel Logs for Link Resets and Speed Drops**:
+
+  - The Linux kernel will print link status changes, including downshift
+    events, in the system logs. These messages typically include speed changes,
+    duplex mode, and downshifted link speed (if the driver supports it).
+
+  - **Command to monitor kernel logs in real-time:**
+
+    .. code-block:: bash
+
+      dmesg -w | grep "Link is Up\|Link is Down"
+
+  - Example Output (if a downshift occurs):
+
+    .. code-block:: bash
+
+      eth0: Link is Up - 100Mbps/Full (downshifted) - flow control rx/tx
+      eth0: Link is Down
+
+    This indicates that the link has been established but has downshifted from
+    a higher speed.
+
+  - **Note**: Not all drivers or PHYs support downshift reporting, so you may
+    not see this information for all devices.
+
+- **Monitor Link Down Events Using `ethtool`**:
+
+  - Starting with the latest kernel and `ethtool` versions, you can track
+    **Link Down Events** using the `ethtool -I` command. This will provide
+    counters for link drops, helping to diagnose link instability issues if
+    supported by the driver.
+
+  - **Command to monitor link down events:**
+
+    .. code-block:: bash
+
+      ethtool -I <interface>
+
+  - Example Output (if supported):
+
+    .. code-block:: bash
+
+      PSE attributes for eth1:
+      Link Down Events: 5
+
+    This indicates that the link has dropped 5 times. Frequent link down events
+    may indicate cable or environmental issues that require further
+    investigation.
+
+- **Check Link Status and Speed**:
+
+  - Even though downshift counts or events are not easily tracked, you can
+    still use `ethtool` to manually check the current link speed and status.
+
+  - **Command:** `ethtool <interface>`
+
+  - **Expected Output:**
+
+    .. code-block:: bash
+
+      Speed: 1000Mb/s
+      Duplex: Full
+      Auto-negotiation: on
+      Link detected: yes
+
+    Any inconsistencies in the expected speed or duplex setting could indicate
+    an issue.
+
+- **Disable Energy-Efficient Ethernet (EEE) for Diagnostics**:
+
+  - **EEE** (Energy-Efficient Ethernet) can be a source of link instability due
+    to transitions in and out of low-power states. For diagnostic purposes, it
+    may be useful to **temporarily** disable EEE to determine if it is
+    contributing to link instability. This is **not a generic recommendation**
+    for disabling power management.
+
+  - **Next Steps**: Disable EEE and monitor if the link becomes stable. If
+    disabling EEE resolves the issue, report the bug so that the driver can be
+    fixed.
+
+  - **Command:**
+
+    .. code-block:: bash
+
+      ethtool --set-eee <interface> eee off
+
+  - **Important**: If disabling EEE resolves the instability, the issue should
+    be reported to the maintainers as a bug, and the driver should be corrected
+    to handle EEE properly without causing instability. Disabling EEE
+    permanently should not be seen as a solution.
+
+- **Monitor Error Counters**:
+
+  - While some NIC drivers and PHYs provide error counters, there is no unified
+    set of PHY-specific counters across all hardware. Additionally, not all
+    PHYs provide useful information related to errors like CRC errors, frame
+    drops, or link flaps. Therefore, this step is dependent on the specific
+    hardware and driver support.
+
+  - **Next Steps**: Use `ethtool -S <interface>` to check if your driver
+    provides useful error counters. In some cases, counters may provide
+    information about errors like link flaps or physical layer problems (e.g.,
+    excessive CRC errors), but results can vary significantly depending on the
+    PHY.
+
+  - **Command:** `ethtool -S <interface>`
+
+  - **Example Output (if supported)**:
+
+    .. code-block:: bash
+
+      rx_crc_errors: 123
+      tx_errors: 45
+      rx_frame_errors: 78
+
+  - **Note**: If no meaningful error counters are available or if counters are
+    not supported, you may need to rely on physical inspections (e.g., cable
+    condition) or kernel log messages (e.g., link up/down events) to further
+    diagnose the issue.
+
+When All Else Fails...
+~~~~~~~~~~~~~~~~~~~~~~
+
+So you've checked the cables, monitored the logs, disabled EEE, and still...
+nothing? Don’t worry, you’re not alone. Sometimes, Ethernet gremlins just don’t
+want to cooperate.
+
+But before you throw in the towel (or the Ethernet cable), take a deep breath.
+It’s always possible that:
+
+1. Your PHY has a unique, undocumented personality.
+
+2. The problem is lying dormant, waiting for just the right moment to magically
+   resolve itself (hey, it happens!).
+
+3. Or, it could be that the ultimate solution simply hasn’t been invented yet.
+
+If none of the above bring you comfort, there’s one final step: contribute! If
+you've uncovered new or unusual issues, or have creative diagnostic methods,
+feel free to share your findings and extend this documentation. Together, we
+can hunt down every elusive network issue - one twisted pair at a time.
+
+Remember: sometimes the solution is just a reboot away, but if not, it’s time to
+dig deeper - or report that bug!
+
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 803dfc1efb751..46c178e564b34 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -14,6 +14,7 @@ Contents:
    can
    can_ucan_protocol
    device_drivers/index
+   diagnostic/index
    dsa/index
    devlink/index
    caif/index
-- 
2.39.5


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ