netdev - [PATCH AUTOSEL 6.17-5.4] bridge: Redirect to backup port when port is administratively down

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20251025160905.3857885-387-sashal@kernel.org>
Date: Sat, 25 Oct 2025 12:00:18 -0400
From: Sasha Levin <sashal@...nel.org>
To: patches@...ts.linux.dev,
	stable@...r.kernel.org
Cc: Ido Schimmel <idosch@...dia.com>,
	Petr Machata <petrm@...dia.com>,
	Nikolay Aleksandrov <razor@...ckwall.org>,
	Jakub Kicinski <kuba@...nel.org>,
	Sasha Levin <sashal@...nel.org>,
	bridge@...ts.linux.dev,
	netdev@...r.kernel.org
Subject: [PATCH AUTOSEL 6.17-5.4] bridge: Redirect to backup port when port is administratively down

From: Ido Schimmel <idosch@...dia.com>

[ Upstream commit 3d05b24429e1de7a17c8fdccb04a04dbc8ad297b ]

If a backup port is configured for a bridge port, the bridge will
redirect known unicast traffic towards the backup port when the primary
port is administratively up but without a carrier. This is useful, for
example, in MLAG configurations where a system is connected to two
switches and there is a peer link between both switches. The peer link
serves as the backup port in case one of the switches loses its
connection to the multi-homed system.

In order to avoid flooding when the primary port loses its carrier, the
bridge does not flush dynamic FDB entries pointing to the port upon STP
disablement, if the port has a backup port.

The above means that known unicast traffic destined to the primary port
will be blackholed when the port is put administratively down, until the
FDB entries pointing to it are aged-out.

Given that the current behavior is quite weird and unlikely to be
depended on by anyone, amend the bridge to redirect to the backup port
also when the primary port is administratively down and not only when it
does not have a carrier.

The change is motivated by a report from a user who expected traffic to
be redirected to the backup port when the primary port was put
administratively down while debugging a network issue.

Reviewed-by: Petr Machata <petrm@...dia.com>
Signed-off-by: Ido Schimmel <idosch@...dia.com>
Acked-by: Nikolay Aleksandrov <razor@...ckwall.org>
Link: https://patch.msgid.link/20250812080213.325298-2-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@...nel.org>
Signed-off-by: Sasha Levin <sashal@...nel.org>
---

LLM Generated explanations, may be completely bogus:

YES

- What it fixes
  - Prevents known-unicast blackholing when a bridge port with a
    configured backup is put administratively down. Today, with a backup
    port configured, FDB entries are intentionally not flushed on STP
    disable (net/bridge/br_stp_if.c:116), so known unicast continues to
    target the primary port. However, br_forward() only redirects to the
    backup when the primary has no carrier, not when it’s
    administratively down, so traffic can be dropped until FDB aging.
  - The patch extends the existing redirection criterion to cover both
    “no carrier” and “admin down,” aligning behavior with user
    expectations in MLAG-like deployments and eliminating a surprising
    failure mode.

- Why it’s a stable-worthy bugfix
  - User-visible impact: Traffic blackhole in a common operational
    scenario (admin down during maintenance/debug), even though a backup
    port is configured and FDB entries are retained specifically to
    allow continued forwarding.
  - Small, contained change: One condition widened in a single function;
    no API/ABI or architectural changes.
  - Consistent with existing semantics: It broadens an already-
    established fast-failover behavior (originally for link/carrier
    loss) to the equivalent “port down” state, which is operationally
    the same intent.
  - Maintainer acks: Reviewed-by and Acked-by from bridge maintainers;
    Signed-off by net maintainer.

- Code reference and rationale
  - Current redirection only when carrier is down:
    - net/bridge/br_forward.c:151
      if (rcu_access_pointer(to->backup_port) &&
      !netif_carrier_ok(to->dev)) { ... }
  - Patch adds admin-down to the same decision, effectively:
    - net/bridge/br_forward.c:151
      if (rcu_access_pointer(to->backup_port) &&
      (!netif_carrier_ok(to->dev) || !netif_running(to->dev))) { ... }
    - This ensures redirection also when `!netif_running()`
      (administratively down).
  - The reason blackholing occurs without this patch:
    - On STP port disable, FDB entries are not flushed if a backup port
      is configured:
      - net/bridge/br_stp_if.c:116
        if (!rcu_access_pointer(p->backup_port))
        br_fdb_delete_by_port(br, p, 0, 0);
    - This optimization (commit 8dc350202d32, “optimize backup_port fdb
      convergence”) intentionally keeps FDB entries to enable seamless
      redirection, but br_forward() fails to redirect when the port is
      admin down, causing drops.

- Risk assessment
  - Minimal regression risk: Checks only `netif_running(to->dev)` in a
    path that already conditionally redirects; `should_deliver()` still
    gates actual forwarding on the backup port’s state and policy.
  - No new features, no data structure changes, no timing-sensitive
    logic added.
  - Behavior remains unchanged unless a backup port is configured, and
    then only in the admin-down case, which is the intended failover
    scenario.

- Backport considerations
  - Applicable to stable series that include backup port support and the
    FDB-retention optimization (e.g., post-2018/2019 kernels). It will
    not apply to trees that predate `backup_port`.
  - The change is a clean one-liner in `br_forward()`; no dependencies
    beyond existing `netif_running()` and `netif_carrier_ok()`.

Conclusion: This is a clear bugfix to prevent data-plane blackholes in a
supported configuration with minimal risk. It should be backported to
stable kernels that have bridge backup-port support.

 net/bridge/br_forward.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/bridge/br_forward.c b/net/bridge/br_forward.c
index 29097e984b4f7..870bdf2e082c4 100644
--- a/net/bridge/br_forward.c
+++ b/net/bridge/br_forward.c
@@ -148,7 +148,8 @@ void br_forward(const struct net_bridge_port *to,
 		goto out;
 
 	/* redirect to backup link if the destination port is down */
-	if (rcu_access_pointer(to->backup_port) && !netif_carrier_ok(to->dev)) {
+	if (rcu_access_pointer(to->backup_port) &&
+	    (!netif_carrier_ok(to->dev) || !netif_running(to->dev))) {
 		struct net_bridge_port *backup_port;
 
 		backup_port = rcu_dereference(to->backup_port);
-- 
2.51.0