lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241003095321.5a3c4e26@fedora.home>
Date: Thu, 3 Oct 2024 09:53:21 +0200
From: Maxime Chevallier <maxime.chevallier@...tlin.com>
To: Oleksij Rempel <o.rempel@...gutronix.de>
Cc: Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet
 <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
 <pabeni@...hat.com>, Rob Herring <robh@...nel.org>, Krzysztof Kozlowski
 <krzk+dt@...nel.org>, Florian Fainelli <f.fainelli@...il.com>, Kory
 Maincent <kory.maincent@...tlin.com>, Lukasz Majewski <lukma@...x.de>,
 Jonathan Corbet <corbet@....net>, kernel@...gutronix.de,
 linux-kernel@...r.kernel.org, netdev@...r.kernel.org, Russell King
 <linux@...linux.org.uk>
Subject: Re: [PATCH net-next v2 1/1] Documentation: networking: add Twisted
 Pair Ethernet diagnostics at OSI Layer 1

Hi Oleksji,

On Thu,  3 Oct 2024 08:06:02 +0200
Oleksij Rempel <o.rempel@...gutronix.de> wrote:

> This patch introduces a diagnostic guide for troubleshooting Twisted
> Pair  Ethernet variants at OSI Layer 1. It provides detailed steps for
> detecting  and resolving common link issues, such as incorrect wiring,
> cable damage,  and power delivery problems. The guide also includes
> interface verification  steps and PHY-specific diagnostics.

This looks nice ! If I may add some suggestions on the layout (the
content looks very good to me) :

[ ...]

> +- **Interpreting the ethtool output**:
> +
> +  - **Supported ports**: Specifies the physical connection type, such as
> +    **Twisted Pair (TP)**.
> +
> +  - **Supported link modes**:
> +
> +    - For **SPE**: This typically indicates one supported mode.
> +    - For **MPE**: Multiple link modes are supported, such as **10baseT/Half,
> +      10baseT/Full, 100baseT/Half, 100baseT/Full**.
> +
> +  - **Supported pause frame use**: Not used for layer 1 diagnostic
> +
> +  - **Supports auto-negotiation**:
> +
> +    - For most **SPE** links (e.g., **100baseT1**), autonegotiation is **not
> +      supported**.
> +
> +    - For **10BaseT1L** and **MPE** links, autonegotiation is typically
> +      **Yes**, allowing dynamic negotiation of speed and duplex settings.
> +
> +  - **Supported FEC modes**: Forward Error Correction (FEC). Currently not
> +    used on this guide.
> +
> +  - **Advertised link modes**:
> +
> +    - For **SPE** (except **10BaseT1L**), this field will be **Not
> +      applicable**, as no link modes can be advertised without autonegotiation.
> +
> +    - For **MPE** and **10BaseT1L** links, this will list the link modes that
> +      the interface is currently advertising to the link partner.
> +
> +  - **Advertised pause frame use**: Not used for layer 1 diagnostic
> +
> +  - **Advertised auto-negotiation**:
> +
> +    - For **SPE** links (except **10BaseT1L**), this will be **No**.
> +
> +    - For **MPE** and **10BaseT1L** links, this will be **Yes** if
> +      autonegotiation is enabled.
> +
> +  - **Link partner advertised link modes**: Relevant for **any device that
> +    supports autonegotiation**, such as **MPE** and **10BaseT1L**. This field
> +    displays the subset  of link modes supported by the link partner and
> +    recognized by the local PHY. If autonegotiation is disabled, this field is
> +    not applicable. Some drivers (or may be HW?) do not provide this information
> +    even with autonegotiation enabled on both sides - this is considered as bug
> +    and should be fixed.
> +
> +  - **Link partner advertised pause frame use**: Indicates whether the link
> +    partner is advertising pause frame support. This field is only relevant
> +    when autonegotiation is enabled.
> +
> +  - **Link partner advertised auto-negotiation**: Displays whether the link
> +    partner is advertising autonegotiation. If the link partner supports
> +    autonegotiation, this field will show **Yes**. If **No**, this field
> +    will be probably not visible.
> +
> +  - **Speed**: Displays the current operational speed of the interface. This
> +    field is especially important when **multiple link modes** are supported.
> +    If **autonegotiation** is enabled, the speed is typically automatically
> +    selected as the **highest common speed** advertised by both link partners.
> +
> +    In cases where the link is in **forced mode** and both sides support
> +    multiple speeds, it is crucial to verify that **both sides are forced to
> +    the same speed**. A mismatch in forced speeds between the link partners will
> +    result in link failure.
> +
> +  - **Duplex**: Displays the current duplex setting of the interface, which can
> +    be either **Half** or **Full**. In **Full Duplex**, data can be transmitted
> +    and received simultaneously, while in **Half Duplex**, transmission and
> +    reception occur sequentially. When **autonegotiation** is enabled, the
> +    duplex mode is typically negotiated along with the speed.
> +
> +    In **forced mode**, it is important to verify that both link partners are
> +    configured with the same duplex setting. A **duplex mismatch** (e.g., one
> +    side using Full Duplex and the other Half Duplex) usually does not affect
> +    the link stability, but it often results in **lower performance**, with
> +    symptoms such as reduced throughput and possible present packet collisions.
> +
> +  - **Auto-negotiation**: Indicates whether auto-negotiation is enabled on the
> +    **local interface**. This shows that the interface is set to negotiate
> +    speed and duplex settings with the link partner. However, even if
> +    **auto-negotiation** is enabled locally and the link is established, the
> +    link partner might not be using auto-negotiation. In such cases, many PHYs
> +    are capable of detecting a **forced mode** on the link partner and
> +    adjusting to the correct speed and duplex.
> +
> +    If the link partner is in **forced mode**, the **"Link partner
> +    advertised"** fields will not be present in the `ethtool` output, as the
> +    partner isn't advertising any link modes or capabilities. Additionally, the
> +    **"Link partner advertised"** fields may also be missing if the **PHY
> +    driver** does not support reporting this information, or if the **MAC
> +    driver** is not utilizing the Linux **PHYlib** framework to retrieve and
> +    report the PHY status.
> +
> +  - **Master-slave configuration**: Indicates the current configuration of the
> +    **master-slave role** for the interface. This is relevant for certain
> +    Ethernet standards, such as **Single-Pair Ethernet (SPE)** and high-speed
> +    Ethernet configurations like **1000Base-T** and above, where one device
> +    must act as the **master** and the other as the **slave** for proper link
> +    establishment.
> +
> +    In **auto-negotiation** mode, the master-slave role is typically negotiated
> +    automatically. However, there are options to specify **preferred-master**
> +    or **preferred-slave** roles. For example, switches often prefer the master
> +    role to reduce the time domain crossing delays.
> +
> +    In **forced mode**, it is essential to manually configure the master-slave
> +    roles correctly on both link partners. If both sides are forced to the same
> +    role (e.g., both forced to master), the link will fail to establish.
> +
> +    A combination of **auto-negotiation** with **forced roles** can lead to
> +    unexpected behavior. If one side forces a role while the other side uses
> +    auto-negotiation, it can result in mismatches, especially if both sides
> +    force overlapping roles (preferring overlapping roles is usually not a
> +    problem). This configuration should be avoided to ensure reliable link
> +    establishment.
> +
> +  - **Master-slave status**: Displays the current **master-slave role** of the
> +    interface, indicating whether the interface is operating as the **master**
> +    or the **slave**. This field is particularly relevant in **auto-negotiation
> +    mode**, where the master-slave role is determined dynamically during the
> +    negotiation process.
> +
> +    In **auto-negotiation**, the role is chosen based on the configuration
> +    preferences of both link partners (e.g., **preferred-master** or
> +    **preferred-slave**). The **master-slave status** field shows the outcome
> +    of this negotiation.
> +
> +    In **forced mode**, the master-slave configuration is manually set, so the
> +    **status** and **configuration** will always be the same, making this field
> +    less relevant in that case.
> +
> +  - **Link detected**: Displays whether the physical link is up and running.
> +
> +  - **Link Down Events**: Tracks how many times the link has gone down. A high
> +    number of **Link Down Events** can indicate a physical issue such as cable
> +    problems or instability.
> +
> +  - **Signal Quality Indicator (SQI)**: Provides a score for signal strength
> +    (e.g., **7/7**). A low score indicates potential physical layer
> +    issues like interference.
> +
> +  - **MDI-X**: Indicates the MDI/MDI-X status, typically relevant for **MPE**
> +    links.
> +
> +  - **Supports Wake-on**: Shows whether Wake-on-LAN is supported.
> +    Not used for layer 1 diagnostic.
> +
> +  - **Wake-on**: Displays whether Wake-on-LAN is enabled (e.g., **Wake-on: d**
> +    for disabled). Not used for layer 1 diagnostic.

(sorry for the long scroll down there) This whole section is more of a
documentation on what ethtool reports rather than a troubleshooting
guide. I'm all in for getting proper doc for this, but maybe we could
move this in a dedicated page, that we would cross-link from that guide
?

[ ... ]

> +List of Twisted Pair Ethernet Link Modes
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Twisted pair Ethernet variants utilize copper cabling with pairs of wires
> +twisted together to reduce electromagnetic interference (EMI). These link modes
> +are widely used in local area networks (LANs) due to their balance of
> +cost-effectiveness and performance.
> +
> +Below is a list of Ethernet link modes that operate over twisted pair copper
> +cabling. Half and Full duplex variants are combined where applicable.

This section below looks to be in the same ballpark. We already have a
documentation on *some* of the MII flavours (SGMII, 1000BaseX, RGMII, etc.),
maybe we would merge the various linkmodes from the MII side and the
MDI side in the same document ?

There's sometimes a misunderstanding of the various linkmodes from
developers themselves, I think this would warrant its own section.

> +- **10baseT Half/Full**:
> +
> +  - The original Ethernet standard over twisted pair cabling.
> +  - Supports both half-duplex and full-duplex modes.
> +
> +- **10baseT1L Full**:
> +
> +  - Long-reach variant of Ethernet over a single twisted pair.
> +  - Supports **autonegotiation** and offers two signal amplitude options:
> +
> +    - **2.4 Vpp** for distances up to **1000 meters**.
> +    - **1 Vpp** for distances up to **200 meters** (used in hazardous
> +      environments).
> +
> +  - Primarily used in industrial and building automation environments.
> +
> +- **10baseT1S Half/Full**:
> +
> +  - Short-reach variant of Ethernet over a single twisted pair.
> +  - Does not support autonegotiation, targeting **fast link establishment within
> +    ~10 ms**.
> +  - Primarily designed for compact locations, such as automotive environments,
> +    where sensors and actuators are clustered.
> +  - Supports **multidrop (point-to-multipoint)** configurations, typically used
> +    to connect clusters of sensors.
> +
> +- **100baseT Half/Full**:
> +
> +  - Also known as Fast Ethernet.
> +  - Operates at 100 Mbps over twisted pair cabling.
> +  - Supports both half-duplex and full-duplex modes.
> +
> +- **100baseT1 Full**:
> +
> +  - Operates at 100 Mbps over a single twisted pair.
> +  - Does not support autonegotiation, targeting **fast link creation within
> +    ~10 ms**.
> +  - Primarily used in automotive and industrial applications.
> +
> +- **1000baseT Full**:
> +
> +  - Gigabit Ethernet over twisted pair cabling.
> +  - Full-duplex mode is standard and widely used.
> +  - Half-duplex mode is not supported by the IEEE 802.3ab standard but may be
> +    present in some hardware implementations.
> +
> +- **1000baseT1 Full**:
> +
> +  - Gigabit Ethernet over a single twisted pair.
> +  - Does not support autonegotiation, targeting **fast link creation within
> +    ~10 ms**.
> +  - Primarily targeted for automotive and industrial use cases.
> +
> +- **2500baseT and 5000baseT Full**:
> +
> +  - Multi-Gigabit Ethernet standards.
> +  - Designed to provide higher speeds over existing Cat5e/Cat6 cabling.
> +  - Operate at 2.5 Gbps and 5 Gbps respectively.
> +
> +- **10000baseT Full**:
> +
> +  - 10 Gigabit Ethernet over twisted pair.
> +  - Requires Cat6a or better cabling to achieve full distance (up to 100 meters).
> +
> +Potential Layer 1 Related Issues
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +OSI Layer 1 issues pertain to the physical aspects of network communication.
> +Some of these issues are interrelated or subsets of larger problems, impacting
> +network performance and connectivity. Below is a structured overview of common
> +Layer 1 issues, grouped by their relationships:
> +
> +Cable Damage and Related Issues
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +- **Cable Damage**:
> +
> +  - **Description**: Physical damage to the Ethernet cable, including cuts,
> +    bends, or degradation due to environmental factors such as heat, moisture,
> +    or mechanical stress.
> +  - **Symptoms**: Intermittent connectivity, reduced speed, or no link.
> +  - **Detection**: Cable testers or PHY diagnostics with time-domain
> +    reflectometry (TDR) support.
> +
> +  - **Subsets of Cable Damage**:
> +
> +    - **Open Circuit**:
> +
> +      - **Description**: A break or discontinuity in the cable or connector
> +        resulting in no electrical connection.
> +      - **Symptoms**: No link is detected.
> +      - **Detection**: PHY diagnostics can report "Open Circuit".
> +    - **Short Circuit**:
> +
> +      - **Description**: An unintended electrical connection between two wires
> +        that should be separate.
> +      - **Symptoms**: The link may not establish, or the link may drop repeatedly.
> +      - **Detection**: Cable testers or PoE/PoDL power detection circuits may
> +        detect excessive current draw.
> +    - **Impedance Mismatch**:
> +
> +      - **Description**: Poor cable quality or incorrect termination causes
> +        reflections of the signal due to impedance variations.
> +      - **Symptoms**: Reduced signal quality, intermittent connectivity at
> +        higher speeds.
> +      - **Detection**: TDR diagnostics can detect impedance mismatches.
> +
> +Wiring Issues
> +^^^^^^^^^^^^^
> +
> +- **Incorrect Wiring or Pinout**:
> +
> +  - **Description**: Incorrect pair wiring or non-standard pin assignments can
> +    cause link failure or degraded performance.
> +  - **Symptoms**: No link, reduced speed, or high error rates, especially in
> +    multi-pair Ethernet standards (e.g., 1000BASE-T).
> +  - **Detection**: Modern PHYs may detect and correct some wiring errors
> +    (e.g., MDI/MDI-X auto-crossover), but cable testers provide the most
> +    reliable diagnostics.
> +
> +  - **Subsets of Incorrect Wiring**:
> +
> +    - **Miswired Pairs in Multi-Pair Link Modes**:
> +
> +      - **Description**: In multi-pair standards like 10BASE-T, 100BASE-TX, or
> +        1000BASE-T, miswired pairs can cause link failures.
> +      - **Symptoms**: Incompatible wiring may work for some speeds (e.g.,
> +        100BASE-TX) but fail for higher speeds (e.g., 1000BASE-T).
> +      - **Detection**: Cable testers or PHY diagnostics may identify the issue.
> +
> +    - **Polarity Reversal within Pairs**:
> +
> +      - **Description**: The positive and negative wires within a pair are
> +        swapped.
> +      - **Symptoms**: No link or intermittent connection unless modern PHYs with
> +        automatic polarity correction are in use.
> +      - **Detection**: Modern PHYs can detect and correct polarity reversal.
> +        Some expose polarity status in diagnostic registers.
> +
> +    - **Split Pairs**:
> +
> +      - **Description**: The two wires of a pair are split across different
> +        pairs, reducing the effectiveness of signal twisting.
> +      - **Symptoms**: Increased crosstalk, higher error rates, and intermittent
> +        link drops, particularly at higher speeds like 1000BASE-T.
> +      - **Detection**: Cable testers can detect split pairs, and error counters
> +        in the PHY may provide an indication.
> +
> +Environmental and External Factors
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +- **Electromagnetic Interference (EMI)**:
> +
> +  - **Description**: External electromagnetic fields can interfere with Ethernet
> +    signals, particularly in unshielded twisted pair (UTP) cables.
> +  - **Symptoms**: Increased transmission errors, reduced speed, or intermittent
> +    link drops.
> +  - **Detection**: Error counters in the PHY or signal quality indicators (SQI)
> +    may help diagnose EMI issues.
> +
> +- **Environmental Factors**:
> +
> +  - **Description**: External environmental conditions such as temperature
> +    extremes, moisture, UV exposure, or mechanical stress can degrade the cable
> +    or connectors, leading to signal degradation.
> +  - **Symptoms**: Increased error rates, intermittent connectivity, or link
> +    failure.
> +  - **Detection**: Error counters and physical inspection can reveal issues
> +    related to environmental degradation.
> +
> +  - **Related Issues**:
> +
> +    - **Excessive Cable Length**:
> +
> +      - **Description**: Exceeding the maximum allowed cable length for a given
> +        standard can lead to signal loss and degradation.
> +      - **Symptoms**: Intermittent connectivity, reduced speed, or no link.
> +      - **Detection**: TDR diagnostics can measure the cable length. Error
> +        counters may show performance degradation.
> +
> +Cable Quality and Type
> +^^^^^^^^^^^^^^^^^^^^^^
> +
> +- **Use of Incorrect Cable Type**:
> +
> +  - **Description**: Using a cable that doesn’t meet the required standards for
> +    a specific Ethernet mode (e.g., using CAT5e for 10GBASE-T) or improper
> +    shielding.
> +  - **Symptoms**: Reduced link speed, increased errors, or no link.
> +  - **Detection**: PHY diagnostics such as SQI and cable testers can help detect
> +    cable quality issues.
> +
> +  - **Related Issue**:
> +
> +    - **Shielding Problems**: Improper or incomplete attachment of the shield
> +      can lead to similar symptoms as EMI issues. Variants include:
> +
> +      - **Unattached Shielding**: Shielding present but not connected at the
> +        connector.
> +      - **Unconnected Device Ports**: Even if the shield is attached, the device
> +        port may not provide a connection.
> +
> +Hardware Issues
> +^^^^^^^^^^^^^^^
> +
> +- **Faulty Network Interface Cards (NICs) or PHYs**:
> +
> +  - **Description**: Malfunctioning hardware components such as NICs or PHYs may
> +    cause link problems.
> +  - **Symptoms**: Network performance degradation or complete failure.
> +  - **Detection**: Some PHYs and NICs perform self-tests and may report errors
> +    in system logs. Swapping hardware may be required to diagnose these issues.
> +
> +Pair Assignment Issues in Multi-Pair Link Modes
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +Ethernet standards that use **two or more pairs** of wires - such as
> +**10BASE-T**, **100BASE-TX**, **1000BASE-T**, and higher - require correct pair
> +assignments for proper operation. Incorrect pair assignments can cause
> +significant network problems, especially as data rates increase.
> +
> +Multi-Pair Link Modes
> +^^^^^^^^^^^^^^^^^^^^^
> +
> +- **Applicable Ethernet Standards**:
> +
> +  - **10BASE-T** (10 Mbps Ethernet)
> +  - **100BASE-TX** (Fast Ethernet)
> +  - **1000BASE-T** (Gigabit Ethernet)
> +  - **2.5GBASE-T**, **5GBASE-T**, **10GBASE-T**
> +
> +Pin and Pair Naming Conventions
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +In Ethernet troubleshooting, understanding pin, pair, and color-coding
> +conventions is essential, especially when physical cable repairs are necessary.
> +One major challenge arises in the field when a damaged cable pair needs to be
> +identified and fixed without the ability to replace the entire cable. While
> +Linux diagnostics typically only provide pair names (e.g., "Pair A"), these
> +names do not directly map to the color codes commonly used for cable
> +identification in the field.
> +
> +To further complicate the issue, different standards—such as **TIA-568** and
> +**IEEE 802.3**—use varying conventions for assigning pins to pairs, and pairs
> +to color codes. For example, the pair names reported in diagnostics must be
> +translated into physical wire colors, which differ between **TIA-568A** and
> +**TIA-568B** layouts. This translation process is crucial for accurately
> +identifying and repairing the correct cable pair.
> +
> +Although Linux diagnostic tools provide valuable information, their focus on
> +pair names can make it challenging to map these names to the physical cable
> +layout, particularly in fieldwork where color-coded wires are the primary means
> +of identification. This section aims to highlight this problem and provide
> +enough background on pin, pair, and color-coding conventions to assist with
> +analyzing and addressing these issues. While this guide may not fully resolve
> +the difficulties, it offers important context to help bridge the gap between
> +diagnostics and physical cable repair.
> +
> +TIA-568 Pair and Pin Assignments
> +""""""""""""""""""""""""""""""""

This section here as well could be in another page (standalone or the
same as above) ?

My idea would be to make it a bit easier to read through the
troubleshooting guide, with on one side step-by-step instructions,
crosslinking to a page containing these detailed descriptions.


[ ... ]

> +Linux Kernel Recommendations for Improved Diagnostic Interfaces
> +---------------------------------------------------------------
> 
> +As of **Linux kernel v6.11**, several improvements could be implemented to
> +enhance the diagnostic capabilities for Ethernet connections, particularly for
> +twisted pair Ethernet variants. These recommendations aim to address gaps in
> +diagnostics for OSI Layer 1 issues and provide more detailed insights for users
> +and developers.
> +
> +This list will evolve with future kernel versions, reflecting new features or
> +refinements. Below are the current suggestions:

I'm not sure this TODO list has its place in this troubleshooting
guide. I agree with the points you list, but this looks more like a
roadmap for PHY stuff to improve. I don't really know where this list
could go and if it's common to maintain this kind of "TODO list" in the
kernel doc though. Maybe Andrew has an idea ?

Thanks for coming-up with such a detailed guide. I also have some "PHY
bringup 101" ideas on the common errors faced by developers, and this is
document would be the ideal place to maintain this crucial information.

Maxime

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ