[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zs1gk3gfU7EAPmPc@pengutronix.de>
Date: Tue, 27 Aug 2024 07:13:55 +0200
From: Oleksij Rempel <o.rempel@...gutronix.de>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Andrew Lunn <andrew@...n.ch>, Heiner Kallweit <hkallweit1@...il.com>,
Russell King <linux@...linux.org.uk>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
kernel@...gutronix.de, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org,
Maxime Chevallier <maxime.chevallier@...tlin.com>,
Vladimir Oltean <vladimir.oltean@....com>,
Marc Kleine-Budde <mkl@...gutronix.de>,
Florian Fainelli <f.fainelli@...il.com>,
Köry Maincent <kory.maincent@...tlin.com>,
thomas.petazzoni@...tlin.com, jlu@...gutronix.de
Subject: Initial Thoughts on OSI Level 1 Diagnostic for 10BaseT1L Interface
Hi all,
I'd like to share my initial thoughts on how to approach network OSI
Layer 1 diagnostics for projects using the 10BaseT1L interface. I
believe this concept could be made more generic and eventually included
in the kernel documentation. The aim is to help developers like myself
prioritize tasks more effectively.
The primary focus of this concept is on embedded systems where human
interaction isn't feasible, meaning all interfaces should be
machine-readable. For instance, a flight-recorder application could
gather as much diagnostic data as possible for later analysis.
At this point, the concept is targeting existing diagnostic interfaces,
but it may also highlight the need for new ones.
Troubleshooting Checklist
=========================
**Symptom:** Interface is in admin UP state and link partner is attached, but
no Ethernet link is detected.
**How to Detect:**
------------------
**Diagnostic Tools:** ``iproute2``, ``ethtool``
**Command:** ``ip link show dev t1l0``
**Command Output:**
.. code-block::
4: t1l0@...0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
link/ether 88:14:2b:00:96:f2 brd ff:ff:ff:ff:ff:ff
**Command:** ``ethtool t1l0``
**Command Output:**
.. code-block::
Settings for t1l0:
Supported ports: [ ]
Supported link modes: 10baseT1L/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT1L/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Speed: Unknown!
Duplex: Unknown! (255)
Auto-negotiation: on
master-slave cfg: preferred slave
master-slave status: unknown
Port: Twisted Pair
PHYAD: 6
Transceiver: external
MDI-X: Unknown
Supports Wake-on: d
Wake-on: d
Link detected: no <----
Possible Reasons:
-----------------
1. **Link Partner (LP) is not powered on**
- **Possibility:** High
- **Probability:** Medium
- **Description:** If LP is powered over PoDL or PoE, ensure that the PSE
(Power Source Equipment) is enabled and functioning properly, e.g., current
is within the consumption range of the link partner.
- **Notes:** This can happen if the PSE is misconfigured, disabled, or if
there is an issue with the power supply. In some cases, the LP may not
request power correctly, leading to no power being delivered.
- **Diagnostic Tools:** ``ethtool``
- **Command:** ``ethtool --show-pse t1l1``
- **Command Output:**
.. code-block::
PSE attributes for t1l1:
PoDL PSE Admin State: enabled
PoDL PSE Power Detection Status: delivering power
- **Possible Limitations:** Not all PSE controllers provide current information.
- **Command:** ``ethtool --cable-test t1l1``
- **Command Output:**
.. code-block::
Cable test started for device t1l1.
Cable test completed for device t1l1.
Pair A, fault length: 25.00m
Pair A code Open Circuit
2. **Cable is damaged with a short between pairs**
- **Possibility:** Medium
- **Probability:** Medium
- **Description:** In case LP is powered over PoDL or PoE, the PSE controller
may disable power due to an overcurrent event.
- **Notes:** This issue could arise if the cable has been physically damaged
(e.g., pinched, cut, or exposed to excessive force), leading to a short
circuit. It's also possible that the installation was done incorrectly,
causing a short.
- **Diagnostic Tools:** ``ethtool``
- **Command:** ``ethtool --show-pse t1l1``
- **Command Output:**
.. code-block::
PoDL PSE Power Detection Status: over current
- **Command:** ``ethtool --cable-test t1l1``
- **Command Output:**
.. code-block::
Cable test started for device t1l1.
Cable test completed for device t1l1.
Pair A, fault length: 25.00m
Pair A code Short within Pair
3. **Cable is damaged with an open state or device is not attached**
- **Possibility:** High
- **Probability:** High
- **Description:** If the device is not attached or the cable is open, no
link will be established.
- **Notes:** This situation is common in environments where cables are
frequently moved, reconnected, or exposed to mechanical stress. It can also
happen if the cable is improperly terminated or if a connector is loose or
damaged.
- **Diagnostic Tools:** ``ethtool``
- **Command:** ``ethtool --cable-test t1l1``
- **Command Output:**
.. code-block::
Cable test started for device t1l1.
Cable test completed for device t1l1.
Pair A, fault length: 25.00m
Pair A code Open Circuit
4. **LP PHY is not up or powered on**
- **Possibility:** Medium
- **Probability:** Medium
- **Description:** The LP may not have powered on the PHY, may not have
brought it out of reset, or the interface may be in admin down state.
- **Notes:** This can occur if the LP's firmware or software fails to
initialize the PHY correctly. It might also happen if the LP is in a
low-power state or if there’s a software misconfiguration.
- **Diagnostic Tools:** ``ethtool``
- **Command:** ``ethtool --cable-test t1l1``
- **Command Output:**
.. code-block::
Cable test started for device t1l1.
Cable test completed for device t1l1.
Pair A, fault length: 25.00m
Pair A code Open Circuit
5. **LP PHY is not compatible with local PHY or misconfigured**
- **Possibility:** Medium
- **Probability:** Low
- **Description:** If the LP PHY's capabilities do not match those of the
local PHY or are misconfigured, the link will not be established.
- **Notes:** Incompatibility issues might arise if the LP is from a different
manufacturer or is an older model. Misconfiguration could occur due to
incorrect software settings or faulty strap pin configurations.
- **Diagnostic Tools:** ``ethtool``
- **Command:** ``ethtool --cable-test t1l1``
- **Command Output:**
.. code-block::
Cable test started for device t1l1.
Cable test completed for device t1l1.
Pair A code Unknown
6. **LP PHY is in forced master mode, autoneg is disabled**
- **Possibility:** High
- **Probability:** Medium
- **Description:** If the LP PHY is in forced master mode with autoneg
disabled, the link will not be established.
- **Notes:** This issue can happen if the LP is manually configured to a
fixed master mode and the local device is not configured accordingly.
Such misconfigurations can occur due to user error or incorrect default
settings.
- **Diagnostic Tools:** ``ethtool``
- **Command:** ``ethtool --cable-test t1l1``
- **Command Output:**
.. code-block::
Cable test started for device t1l1.
Cable test completed for device t1l1.
Pair A code Unknown
7. **LP PHY is in forced slave mode, autoneg is disabled**
- **Possibility:** High
- **Probability:** Medium
- **Description:** If the LP PHY is in forced slave mode with autoneg
disabled, sometimes the link can still be established, but there are cases
where no link is detected.
- **Notes:** Similar to the forced master mode, this can be caused by manual
configuration or specific use cases where autoneg is intentionally
disabled. However, this might lead to unpredictable behavior in
establishing the link.
- **Diagnostic Tools:** ``ethtool``
- **Command:** ``ethtool --cable-test t1l1``
- **Command Output:**
.. code-block::
Cable test started for device t1l1.
Cable test completed for device t1l1.
Pair A code OK
- **Note:** ``ethtool --cable-test`` may need to be executed multiple times
as it sometimes shows "Pair A code Unknown".
Reverse Troubleshooting Checklist
=================================
**Goal:** Systematically execute diagnostic commands to identify the root cause
when no Ethernet link is detected, despite the interface being in admin UP
state and the link partner being attached.
**Step 1: Verify Interface Status**
-----------------------------------
**Command:** ``ip link show dev t1l0``
**Expected Output:** Interface should be in the ``UP`` state with ``NO-CARRIER`` flag.
.. code-block::
4: t1l0@...0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
link/ether 88:14:2b:00:96:f2 brd ff:ff:ff:ff:ff:ff
- **If Output:** Interface is down, then check administrative settings or bring
the interface up using ``ip link set dev t1l0 up``.
- **If Output:** Interface is up but shows ``NO-CARRIER``, proceed to Step 2.
**Step 2: Check Link Detection and PHY Status**
-----------------------------------------------
**Command:** ``ethtool t1l0``
**Expected Output:** Verify ``Link detected: no`` and review PHY status.
.. code-block::
Settings for t1l0:
Speed: Unknown!
Duplex: Unknown! (255)
Auto-negotiation: on
master-slave cfg: preferred slave
master-slave status: unknown
Link detected: no
- **If Output:** ``Link detected: yes``, the issue might be intermittent or
resolved. Monitor the connection.
- **If Output:** ``Link detected: no``, proceed to Step 3.
**Step 3: Check Power Over Data Line (PoDL) Status**
-----------------------------------------------------
**Command:** ``ethtool --show-pse t1l1``
**Expected Output:** Power delivery status should show as ``delivering power``.
.. code-block::
PSE attributes for t1l1:
PoDL PSE Admin State: enabled
PoDL PSE Power Detection Status: delivering power
- **If Output:** ``Power Detection Status: delivering power``, then power is
likely not the issue. Proceed to Step 4.
- **If Output:** ``Power Detection Status: over current`` or ``disabled``, check
PSE configuration or troubleshoot possible short circuits. Recheck after
resolving the issue.
**Step 4: Perform Cable Diagnostics**
-------------------------------------
**Command:** ``ethtool --cable-test t1l1``
**Expected Output:** Cable test results should indicate if the cable is intact.
.. code-block::
Cable test started for device t1l1.
Cable test completed for device t1l1.
Pair A code Open Circuit
- **If Output:** ``Pair A code Open Circuit``, proceed to Step 5.
- **If Output:** ``Pair A code Short within Pair``, proceed to Step 7.
- **If Output:** ``Pair A code OK``, proceed to Step 6.
- **If Output:** ``Pair A code Unknown``, proceed to Step 8.
**Step 5: Investigate `Pair A code Open Circuit` Case**
-------------------------------------------------------
- **Description:** This indicates that the cable may be damaged or not
connected. It can also suggest that the link partner is not powered up, its
PHY is in reset, or the interface is in admin down state.
- **Action 1: Use Fault Distance Information**
- **Explanation:** The `Pair A, fault length:` reported by the diagnostic
tool can be used to identify where the cable might be disconnected or
damaged.
- **Example Output:**
.. code-block::
Pair A, fault length: 25.00m
- **Diagnostic Strategy:**
- If the reported fault length is around 13m (the minimal detection distance
for the dp83td510 PHY and/or current driver), and the actual cable length
is greater than 13m:
- **Interpretation:** The cable is likely disconnected on the local side
or damaged within the first 13m.
- **Next Step:** Inspect and secure the connection on the local side or
replace the damaged section of the cable.
- If the actual cable length is less than or equal to 13m:
- **Interpretation:** The fault could still be due to any of the potential
failure options, including cable damage, PHY issues, or a disconnected
link partner.
- **Action 2: Measure Current Drawn by the Link Partner (if PoDL/PoE is used)**
- **Explanation:** If the local PSE controller provides this functionality,
measuring the current drawn by the link partner can help diagnose the issue.
- **Next Step:** Use the current measurement to determine if the link partner
is receiving power. If the current is abnormally low or zero, it indicates
that the link partner might not be powered on or that there is a power
delivery issue.
**Step 6: Investigate `Pair A code OK` Case**
---------------------------------------------
- **Description:** This means the cable is likely fine, but the link partner's
PHY might be in forced slave mode with autonegotiation disabled, which can
prevent the link from being established.
- **Action:** Test if the link partner is in forced slave mode by configuring
the local system to operate in forced master mode.
- **Command:**
.. code-block::
ethtool -s t1l1 master-slave forced-master speed 10 duplex full autoneg off
- **Explanation:** If this command establishes the link, it confirms that the
link partner is in forced slave mode with autonegotiation disabled. You may
need to adjust the local configuration or re-enable autonegotiation on the
link partner for a proper link setup.
**Step 7: Investigate `Pair A code Short within Pair` Case**
------------------------------------------------------------
- **Description:** This indicates that there is a short circuit within the
cable pairs. The diagnostic tool provides a `Pair A, fault length:` that
indicates the distance to the fault.
- **Example Output:**
.. code-block::
Pair A, fault length: 25.00m
- **Diagnostic Strategy:**
- **Use Fault Distance Information:**
- If the reported fault length is around 13m (the minimal detection distance
for the dp83td510 PHY and/or current driver):
- **Interpretation:** The short is likely within the first 13 meters of
the cable, possibly close to the local side of the connection.
- **Next Step:** Inspect the cable for damage or improper connections
within the first 13 meters from the local port. If the issue is found,
repair or replace the damaged section of the cable.
- If the fault length is greater than 13m:
- **Interpretation:** The short circuit is located at the specified
distance along the cable. This could indicate physical damage or
incorrect wiring at the identified location.
- **Next Step:** Locate the specified distance along the cable, then
inspect for any visible damage, bends, or improper connections at that
point. Repair or replace the affected section as necessary.
**Step 8: Investigate `Pair A code Unknown` Case**
---------------------------------------------------
- **Description:** This result indicates that the cable test cannot provide
usable results due to noise on the cable. Since the PHY seems to perform a
variant of Spread Spectrum Time Domain Reflectometry (SSTDR), which is
relatively immune to usual low-amplitude noise sources, this error typically
suggests one of two scenarios:
1. The link partner is constantly sending autonegotiation pulses, but the
connection cannot be established.
2. The link partner is in a fixed master mode without autonegotiation enabled.
- **Diagnostic Strategy:**
- **Scenario 1: Fixed Master Mode without Autonegotiation**
- **Action:** Verify if the link partner is in fixed master mode by
configuring the local PHY to forced slave mode.
- **Command:**
.. code-block::
ethtool -s t1l1 master-slave forced-slave speed 10 duplex full autoneg off
- **Next Step:** If this configuration establishes a link, it confirms that
the link partner is in fixed master mode with autonegotiation disabled.
You may need to adjust the link partner’s configuration or leave the
local PHY in forced slave mode.
- **Scenario 2: Autonegotiation Pulse Issue (Not Verifiable Remotely)**
- **Explanation:** If the link partner’s PHY is enabled and continuously
sending autonegotiation pulses, but the link cannot be established,
this may be due to incompatible PHY capabilities. This is currently not
verifiable remotely.
- **Potential Causes:**
- The link partner’s PHY announces capabilities that do not match the
local PHY, leading to a failure to establish the link.
- In 10BaseT1L cases, if one device is certified for a dangerous
environment (e.g., should operate only with a 1.8V signal) and the other
device announces both 1.8V and 2V signal capabilities, the link may be
discarded.
- **Scenario 3: Noise or Interference**
- **Explanation:** Although SSTDR is resilient to common noise, significant
or unusual noise sources might still disrupt the test.
- **Next Step:** Investigate the physical environment for potential sources
of interference (e.g., strong electromagnetic fields) and mitigate them
if possible.
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Powered by blists - more mailing lists