lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <06798029-660D-454E-8628-3A9B9E1AF6F8@safebits.tech>
Date: Sat, 7 Oct 2023 14:39:31 +0300
From: Luci Stanescu <luci@...ebits.tech>
To: "David S. Miller" <davem@...emloft.net>,
 David Ahern <dsahern@...nel.org>
Cc: netdev@...r.kernel.org
Subject: IPv6 recvmsg() wrong scope for source address when using VRFs

Hi,

I've discovered that the wrong sin6_scope_id is filled in by recvmsg() in msg_name when using VRFs. Specifically, the scope contains the index of the VRF interface, instead of the slave on which the packet was received. This scope is unfortunately useless if link-local addressing is used. The context in which I discovered this issue is using non-local communication with UDP sockets and multicast (specifically having a DHCPv6 server on an interface enslaved to a VRF), but I believe the issue may be applicable to other transports and it certainly applies to unicast, which I've used to reproduce the issue in a simpler way.

Here's how to reproduce. I'm going to exemplify using Python and local communication with veth devices for brevity. I'm using Ubuntu 22.04 LTS, with kernel 6.2.0-34, but I've tracked this down in the source code in the master branch (further down), so please bear with me. I'm going to call my VRF interface "myvrf". I'm going to create a veth pair and enslave one end to the VRF.

ip link add myvrf type vrf table 42
ip link set myvrf up
ip link add veth1 type veth peer name veth2
ip link set veth1 master myvrf up
ip link set veth2 up

# ip link sh dev myvrf
110: myvrf: <NOARP,MASTER,UP,LOWER_UP> mtu 65575 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether da:ca:c9:2b:6e:02 brd ff:ff:ff:ff:ff:ff
# ip addr sh dev veth1
112: veth1@...h2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master myvrf state UP group default qlen 1000
    link/ether 32:63:cf:f5:08:35 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::3063:cfff:fef5:835/64 scope link
       valid_lft forever preferred_lft forever
# ip addr sh dev veth2
111: veth2@...h1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 1a:8f:5a:85:3c:c0 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::188f:5aff:fe85:3cc0/64 scope link
       valid_lft forever preferred_lft forever

The receiver:
import socket
import struct

s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
s.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_RECVPKTINFO, 1)
s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b'myvrf')
s.bind(('', 2000, 0, 0))

while True:
    data, cmsg_list, flags, source = s.recvmsg(4096, 4096)
    for level, type, cmsg_data in cmsg_list:
        if level == socket.IPPROTO_IPV6 and type == socket.IPV6_PKTINFO:
            source_address, source_scope = struct.unpack('@...I', cmsg_data)
            source_address = socket.inet_ntop(socket.AF_INET6, source_address)
            print("PKTINFO destination {} {}".format(source_address, source_scope))
    source_address, source_port, source_flow, source_scope = source
    print("name source {} {}".format(source_address, source_scope))

The same thing happens, as expected, if sysctl net.ipv4.udp_l3mdev_accept is set to 1 and the receiver doesn't bind the socket to the VRF master device. The sender is going to use the link-local address of veth1 to address the packet on veth2 (scope 111):

import socket

s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, socket.IPPROTO_UDP)
dest = ('fe80::3063:cfff:fef5:835', 2000, 0, 111)
s.sendto(b'foo', dest)

Please note that the destination address is the veth1 link-local address and the scope is the veth2 interface index. The receiver will print this:
PKTINFO destination fe80::3063:cfff:fef5:835 112
name source fe80::188f:5aff:fe85:3cc0 110

Please note that the scope of the destination (from IPV6_PKTINFO) is, correctly, the interface index of the receiving interface, veth1. However, the scope of the source in the msg_name is the interface index of the VRF master device. Unfortunately, for link-local addressing, the scope of the VRF master device is useless. In my original problem, a DHCPv6 server wouldn't be able to send a response packet to the link-local address. While an application could certainly use IPV6_PKTINFO to work around this problem, I believe it feels like a bit of a hack.

I've tracked this down in the source code to the following (please bear with my explanations, as I've not really familiar with the code):

First, in 2014, the scope of was changed from IP6CB(skb)->iif to inet6_iif(skb) in commit https://github.com/torvalds/linux/commit/4330487acfff0cf1d7b14d238583a182e0a444bb. At the time, that function from include/linux/ipv6.h simply returned P6CB(skb)->iif, so that was a bit of a NOOP.

Then, in 2016, inet6_iif was changed to return the VRF master if P6CB(skb)->iif was enslaved to a VRF in this commit:
https://github.com/torvalds/linux/commit/74b20582ac389ee9f18a6fcc0eef244658ce8de0. Now, that also made sense because at the time you couldn't connect() or sendmsg() over a VRF by specifying a VRF slave interface index as a destination, you had to specify the VRF master interface index in the scope. Using link-local addresses of VRF enslaved devices at this point in time would've been impossible anyway.

But then, in 2018, a series of patches allowed things like connect() and sendmsg() to specify the index of a VRF slave interface, thus allowing link-local addresses to be used. For example:
https://github.com/torvalds/linux/commit/54dc3e3324829d346c959ff774626d9c6c9a65b5
https://github.com/torvalds/linux/commit/6da5b0f027a825df2aebc1927a27bda185dc03d4

I do not know enough about the code to understand whether after those patches in 2018 inet6_iif() could be changed to return the VRF slave device instead of the master or whether recvmsg() should not longer use inet6_iif(), but I do believe the scope returned by recvmsg() is a bug.

Thank you for your time!

-- 
Luci Stanescu

Content of type "text/html" skipped

Download attachment "smime.p7s" of type "application/pkcs7-signature" (3602 bytes)

Powered by blists - more mailing lists