lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date: Thu, 27 Jun 2024 23:53:21 +0000
From: "Muggeridge, Matt" <matt.muggeridge2@....com>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Wrong nexthop selection with two default routers where only one is
 REACHABLE

Hi,

This appears to be a bug in Linux kernel networking. This was observed on a fresh install of Ubuntu 24.04, with Linux 6.8.0-36-generic.

* PROBLEM
In the network diagram below, I have two default routers (TR1 and TR2). The HUT has two neighbor cache entries: TR1=REACHABLE and TR2=INCOMPLETE.? When I ping the host (HUT) from a remote test node (TN2) via TR1, the HUT sends a NS for TR2 when it should have replied directly via TR1.? This breaks communication and violates IPv6 Logo compliance.

??????????? TN2
???????????? |
??? +--------+--------+
??? |???????????????? |
?? TR1?????????????? TR2
(REACHABLE)????? (INCOMPLETE)
??? |???????????????? |
??? +--------+--------+
???????????? |
??????????? HUT

The RFC for Neighbor Discovery describes the policy for selecting routes from the Default Router List. The relevant bullet is extracted below.

https://datatracker.ietf.org/doc/html/rfc4861#section-6.3.6
 | The policy for selecting routers from the Default Router List is as
 | follows:
 |
 | 1) Routers that are reachable or probably reachable (i.e., in any
?|?? state other than INCOMPLETE) SHOULD be preferred over routers
?|?? whose reachability is unknown or suspect (i.e., in the
?|?? INCOMPLETE state, or for which no Neighbor Cache entry exists).
?|?? Further implementation hints on default router selection when
?|?? multiple equivalent routers are available are discussed in
?|?? [[LD-SHRE](https://datatracker.ietf.org/doc/html/rfc4861#ref-LD-SHRE)].

* REPRODUCER
This condition is created by configuring two routers under systemd-networkd, either by having each router send an RA, or statically configuring one router at a time. I show the steps for the static configuration below.

Assuming you have an interface named "enp0s9" and you're using systemd-networkd as the network manager:

1. Configure the Host (HUT) with one router (TR1)
$ networkctl cat 10-enp0s9.network
# /etc/systemd/network/10-enp0s9.network
[Match]
Name=enp0s9

[Link]
RequiredForOnline=no

[Network]
Description="Internal Network: Private VM-to-VM IPv6 interface"
DHCP=no
LLDP=no
EmitLLDP=no


# /etc/systemd/network/10-enp0s9.network.d/address.conf
[Network]
Address=2001:2:0:1000:a00:27ff:fe5f:f72d/64


# /etc/systemd/network/10-enp0s9.network.d/route-1060.conf
[Route]
Gateway=fe80::200:10ff:fe10:1060
GatewayOnLink=true

2. Start or reload the configuration
$ sudo networkctl reload
$ sudo networkctl reconfigure enp0s9
$ ip -6 r
2001:2:0:1000::/64 dev enp0s9 proto kernel metric 256 pref medium
fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
fe80::/64 dev enp0s9 proto kernel metric 256 pref medium
default via fe80::200:10ff:fe10:1060 dev enp0s9 proto static metric 1024 onlink pref medium

3. Flush and Monitor the neighbor cache
    $ sudo ip -6 neigh flush all; ip -6 -ts monitor neigh
    
4. From TN1, ping HUT via TR1 - the HUT's NCE is updated to REACHABLE
[2024-06-28T08:13:27.617674] fe80::200:10ff:fe10:1060 dev enp0s9 lladdr 00:00:10:10:10:60 router REACHABLE

NOTE: tcpdump shows the expected protocol exchange.

5. Configure the Host (HUT) with a 2nd router (TR2)
$ cat /etc/systemd/network/10-enp0s9.network.d/route-1061.conf 
[Route]
Gateway=fe80::200:10ff:fe10:1061
GatewayOnLink=true
$ sudo networkctl reload
$ sudo networkctl reconfigure enp0s9
$ ip -6 r
2001:2:0:1000::/64 dev enp0s9 proto kernel metric 256 pref medium
fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
fe80::/64 dev enp0s9 proto kernel metric 256 pref medium
default proto static metric 1024 pref medium
???? nexthop via fe80::200:10ff:fe10:1061 dev enp0s9 weight 1 
???? nexthop via fe80::200:10ff:fe10:1060 dev enp0s9 weight 1

6. Start monitoring traffic with tcpdump/WireShark

7. From TN1, ping HUT via TR1
a. An echo reply is never received
b. The protocol exchange shows the HUT sends a NS for TR2 (which is NOT REACHABLE) when it should have sent an echo-reply via TR1 (which is REACHABLE).

* OBSERVATIONS
1. When NOT using systemd-network and each router sends an RA, the kernel behaves correctly.
2. The routing table looks different, depending on whether the kernel adds the route or systemd-networkd adds the route. E.g.

    a. Kernel adds two separate "default route" entries (systemd-networkd is stopped)
$ ip -6 route
<deleted lines>
default via fe80::200:10ff:fe10:1060 proto ra metric 1024 expires 39sec hoplimit 64 pref medium
default via fe80::200:10ff:fe10:1061 proto ra metric 1024 expires 44sec hoplimit 64 pref medium

    b. Systemd-networkd adds one "default route" with two nexthop options (systemd-networkd is running)
$ ip -6 route
<deleted lines>
default proto ra metric 1024 expires 589sec pref medium
?nexthop via fe80::200:10ff:fe10:1060 dev enp0s9 weight 1
?nexthop via fe80::200:10ff:fe10:1061 dev enp0s9 weight 1

* TCPDUMP
For completeness, here is the annotated output from tcpdump.

$ tcpdump -r ~/v6LC_2_2_11-bug-report-summary.pcapng -t -n --number -e
reading from file /home/matt/v6LC_2_2_11-bug-report-summary.pcapng, link-type EN10MB (Ethernet), snapshot length 262144

?? ?# Step 4:? TN1(1181) pings HUT(f72d) via TR1(1060)
??? 1? 00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 70: 2001:2:0:1001:200:10ff:fe10:1181 > 2001:2:0:1000:a00:27ff:fe5f:f72d: ICMP6, echo request, id 0, seq 0, length 16
??? 2? 08:00:27:5f:f7:2d > 33:33:ff:10:10:60, ethertype IPv6 (0x86dd), length 86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1060: ICMP6, neighbor solicitation, who has fe80::200:10ff:fe10:1060, length 32
??? 3? 00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 86: fe80::200:10ff:fe10:1060 > fe80::a00:27ff:fe5f:f72d: ICMP6, neighbor advertisement, tgt is fe80::200:10ff:fe10:1060, length 32
??? 4? 08:00:27:5f:f7:2d > 00:00:10:10:10:60, ethertype IPv6 (0x86dd), length 70: 2001:2:0:1000:a00:27ff:fe5f:f72d > 2001:2:0:1001:200:10ff:fe10:1181: ICMP6, echo reply, id 0, seq 0, length 16

??? # HUT has replied to TN1 via TR1.? NCE for TR1=REACHABLE

??? # Step 5: Now configure TR2 
????# Step 7: ??TN1(1181) pings HUT(f72d) via TR1(1060)
??? 5? 00:00:10:10:10:60 > 08:00:27:5f:f7:2d, ethertype IPv6 (0x86dd), length 70: 2001:2:0:1001:200:10ff:fe10:1181 > 2001:2:0:1000:a00:27ff:fe5f:f72d: ICMP6, echo request, id 0, seq 0, length 16

??? # HUT creates an NCE for TR2=INCOMPLETE

?? ?# HUT incorrectly sends NS for TR2(1061) when it should have sent echo-reply via TR1(1060)
??? 6? 08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1061: ICMP6, neighbor solicitation, who has fe80::200:10ff:fe10:1061, length 32
??? 7? 08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1061: ICMP6, neighbor solicitation, who has fe80::200:10ff:fe10:1061, length 32
??? 8? 08:00:27:5f:f7:2d > 33:33:ff:10:10:61, ethertype IPv6 (0x86dd), length 86: 2001:2:0:1000:a00:27ff:fe5f:f72d > ff02::1:ff10:1061: ICMP6, neighbor solicitation, who has fe80::200:10ff:fe10:1061, length 32

Regards,
Matt.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ