lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <TY1PR0301MB10074DC6D1F5CE4F4B5AF7B5A0920@TY1PR0301MB1007.apcprd03.prod.outlook.com>
Date:   Wed, 16 May 2018 01:51:36 +0000
From:   Hirotaka Yamamoto <ymmt@...ozu.com>
To:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: ECMP routing: problematic selection of outgoing interface

Hi,

Recently I have built a highly-available network using an ECMP
route connected to two isolated L2 switches as follows.

Router-- ToR switch 1  ---- Linux
     |   192.168.11.1/24     |  eth0: 192.168.11.2/24
     |                       |  eth1: 192.168.12.2/24
     +-- ToR switch 2  ------+
         192.168.12.1/24

The (default) route has been configured with:

    $ sudo ip route add default \
           nexthop via 192.168.11.1 \
           nexthop via 192.168.12.1

Then I found that Linux chooses a wrong outgoing device for some
destination/source address pairs like this:

    $ ip route get 12.34.56.78 from 192.168.12.2:
    12.34.56.78 from 192.168.12.2 via 192.168.11.1 dev eth0 uid 0
                                                 # dev should be "eth1"

As a consequence, programs like SSH or curl do not work for such
destinations because routers drop packets having strange source
addresses.

Unbound sockets also suffer this problem.  My guess for this is that
Linux chooses a source address first, then a wrong outgoing device.

Although I believe this is a bug in Linux, I found a possibly relevant
comment in function ip_route_output_key_hash_rcu at net/ipv4/route.c:

    /* I removed check for oif == dev_out->oif here.
       It was wrong for two reasons:
       1. ip_dev_find(net, saddr) can return wrong iface, if saddr
          is assigned to multiple interfaces.
       2. Moreover, we are allowed to send packets with saddr
          of another iface. --ANK

According to the comment 2, I wonder this behavior might be intended.

So, my question is:

1. Is this intended or not?
2. If this is intended, how can I make programs work in this ECMP network?


I have created a simple script to reproduce the problem (attached below).
The script creates a dedicated network namespace "testns" and configures
ECMP route to reproduce the problem.

So far, I can reproduce the problem with these Linux versions:
    - 4.17-rc5          (Upstream)
    - 4.15.0-20-generic (Ubuntu 18.04)
    - 4.14.32-coreos    (CoreOS)
    - 4.13.0-37-generic (Ubuntu 16.04 HWE)
    - 4.4.0-116-generic (Ubuntu 16.04)

Note that the problem is not limited to the default route.
Any route configured as ECMP can cause the problem.

- ymmt

#!/bin/sh -e

NS=testns

BR1=testbr1
VETH1=testveth1
BR2=testbr2
VETH2=testveth2
LINKS="$VETH1 $VETH2 $BR1 $BR2"

NET1=192.168.11.xx/24
NET2=192.168.12.xx/24
IPNS="ip netns exec $NS ip"

clean() {
    for l in $LINKS; do
        if ip -o link show $l >/dev/null 2>&1; then
            ip link del $l
        fi
    done

    if ip netns list | grep -q $NS; then
        ip netns del $NS
    fi
}
trap clean INT QUIT TERM HUP PIPE 0

make_address() {
    local net addr
    net=$1
    addr=$2

    echo $net | sed "s/xx/$addr/"
}

cidr2ip() {
    echo $1 | cut -d / -f 1
}

GW1=$(make_address $NET1 1)
GW2=$(make_address $NET2 1)
ADDR1=$(make_address $NET1 2)
ADDR2=$(make_address $NET2 2)

setup_veth() {
    local br veth dest
    br=$1
    veth=$2
    dest=$3

    ip link add $br type bridge
    ip link add $veth type veth peer name ${veth}_
    ip link set $br up
    ip link set $veth master $br up
    ip link set ${veth}_ netns $NS name $dest up
}

setup() {
    ip netns add $NS
    $IPNS link set lo up

    setup_veth $BR1 $VETH1 eth0
    setup_veth $BR2 $VETH2 eth1

    local gw1 gw2
    ip addr add $GW1 dev $BR1
    ip addr add $GW2 dev $BR2
    $IPNS addr add $ADDR1 dev eth0
    $IPNS addr add $ADDR2 dev eth1

    $IPNS route add 0.0.0.0/0 nexthop via $(cidr2ip $GW1) nexthop via $(cidr2ip $GW2)
}

test_route_from() {
    local dest dev from r rdev
    dest=$1
    dev=$2
    from=$3
    r=$($IPNS -o route get $dest from $from)
    rdev=$(echo $r | sed -nr 's/^.*dev (eth[[:digit:]]+).*/\1/p')
    if [ "$dev" != "$rdev" ]; then
        echo "WRONG dev/from pair: ip -o route get $dest from $from:"
        printf "%s\n" "$r"
        return
    fi
}

test_route() {
    test_route_from "$1" eth0 $(cidr2ip $ADDR1)
    test_route_from "$1" eth1 $(cidr2ip $ADDR2)
}

run_tests() {
    test_route 12.34.56.78
    test_route 216.58.200.160
    test_route 216.58.200.161
    test_route 216.58.200.162
    test_route 216.58.200.163
    test_route 216.58.200.164
    test_route 52.85.149.10
    test_route 52.85.149.11
    test_route 52.85.149.12
    test_route 52.85.149.13
    test_route 52.85.149.14
}

# main
setup
run_tests
read -p "Press enter to finish" ret

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ