lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 16 Aug 2021 12:08:00 -0600
From:   David Ahern <dsahern@...il.com>
To:     Lahav Schlesinger <lschlesinger@...venets.com>,
        netdev@...r.kernel.org
Cc:     dsahern@...nel.org, davem@...emloft.net, kuba@...nel.org
Subject: Re: [PATCH] vrf: Reset skb conntrack connection on VRF rcv

On 8/15/21 6:00 AM, Lahav Schlesinger wrote:
> To fix the "reverse-NAT" for replies.
> 
> When a packet is sent over a VRF, the POST_ROUTING hooks are called
> twice: Once from the VRF interface, and once from the "actual"
> interface the packet will be sent from:
> 1) First SNAT: l3mdev_l3_out() -> vrf_l3_out() -> .. -> vrf_output_direct()
>      This causes the POST_ROUTING hooks to run.
> 2) Second SNAT: 'ip_output()' calls POST_ROUTING hooks again.
> 
> Similarly for replies, first ip_rcv() calls PRE_ROUTING hooks, and
> second vrf_l3_rcv() calls them again.
> 
> As an example, consider the following SNAT rule:
>> iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j SNAT --to-source 2.2.2.2 -o vrf_1
> 
> In this case sending over a VRF will create 2 conntrack entries.
> The first is from the VRF interface, which performs the IP SNAT.
> The second will run the SNAT, but since the "expected reply" will remain
> the same, conntrack randomizes the source port of the packet:
> e..g With a socket bound to 1.1.1.1:10000, sending to 3.3.3.3:53, the conntrack
> rules are:
> udp      17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=0 bytes=0 mark=0 use=1
> udp      17 29 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1
> 
> i.e. First SNAT IP from 1.1.1.1 --> 2.2.2.2, and second the src port is
> SNAT-ed from 10000 --> 61033.
> 
> But when a reply is sent (3.3.3.3:53 -> 2.2.2.2:61033) only the later
> conntrack entry is matched:
> udp      17 29 src=2.2.2.2 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 src=3.3.3.3 dst=2.2.2.2 sport=53 dport=61033 packets=1 bytes=49 mark=0 use=1
> udp      17 28 src=1.1.1.1 dst=3.3.3.3 sport=10000 dport=53 packets=1 bytes=68 [UNREPLIED] src=3.3.3.3 dst=2.2.2.2 sport=53 dport=10000 packets=0 bytes=0 mark=0 use=1
> 
> And a "port 61033 unreachable" ICMP packet is sent back.
> 
> The issue is that when PRE_ROUTING hooks are called from vrf_l3_rcv(),
> the skb already has a conntrack flow attached to it, which means
> nf_conntrack_in() will not resolve the flow again.
> 
> This means only the dest port is "reverse-NATed" (61033 -> 10000) but
> the dest IP remains 2.2.2.2, and since the socket is bound to 1.1.1.1 it's
> not received.
> This can be verified by logging the 4-tuple of the packet in '__udp4_lib_rcv()'.
> 
> The fix is then to reset the flow when skb is received on a VRF, to let
> conntrack resolve the flow again (which now will hit the earlier flow).
> 
> To reproduce: (Without the fix "Got pkt_to_nat_port" will not be printed by
>   running 'bash ./repro'):
>   $ cat run_in_A1.py
>   import logging
>   logging.getLogger("scapy.runtime").setLevel(logging.ERROR)
>   from scapy.all import *
>   import argparse
> 
>   def get_packet_to_send(udp_dst_port, msg_name):
>       return Ether(src='11:22:33:44:55:66', dst=iface_mac)/ \
>           IP(src='3.3.3.3', dst='2.2.2.2')/ \
>           UDP(sport=53, dport=udp_dst_port)/ \
>           Raw(f'{msg_name}\x0012345678901234567890')
> 
>   parser = argparse.ArgumentParser()
>   parser.add_argument('-iface_mac', dest="iface_mac", type=str, required=True,
>                       help="From run_in_A3.py")
>   parser.add_argument('-socket_port', dest="socket_port", type=str,
>                       required=True, help="From run_in_A3.py")
>   parser.add_argument('-v1_mac', dest="v1_mac", type=str, required=True,
>                       help="From script")
> 
>   args, _ = parser.parse_known_args()
>   iface_mac = args.iface_mac
>   socket_port = int(args.socket_port)
>   v1_mac = args.v1_mac
> 
>   print(f'Source port before NAT: {socket_port}')
> 
>   while True:
>       pkts = sniff(iface='_v0', store=True, count=1, timeout=10)
>       if 0 == len(pkts):
>           print('Something failed, rerun the script :(', flush=True)
>           break
>       pkt = pkts[0]
>       if not pkt.haslayer('UDP'):
>           continue
> 
>       pkt_sport = pkt.getlayer('UDP').sport
>       print(f'Source port after NAT: {pkt_sport}', flush=True)
> 
>       pkt_to_send = get_packet_to_send(pkt_sport, 'pkt_to_nat_port')
>       sendp(pkt_to_send, '_v0', verbose=False) # Will not be received
> 
>       pkt_to_send = get_packet_to_send(socket_port, 'pkt_to_socket_port')
>       sendp(pkt_to_send, '_v0', verbose=False)
>       break
> 
>   $ cat run_in_A2.py
>   import socket
>   import netifaces
> 
>   print(f"{netifaces.ifaddresses('e00000')[netifaces.AF_LINK][0]['addr']}",
>         flush=True)
>   s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
>   s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE,
>                str('vrf_1' + '\0').encode('utf-8'))
>   s.connect(('3.3.3.3', 53))
>   print(f'{s. getsockname()[1]}', flush=True)
>   s.settimeout(5)
> 
>   while True:
>       try:
>           # Periodically send in order to keep the conntrack entry alive.
>           s.send(b'a'*40)
>           resp = s.recvfrom(1024)
>           msg_name = resp[0].decode('utf-8').split('\0')[0]
>           print(f"Got {msg_name}", flush=True)
>       except Exception as e:
>           pass
> 
>   $ cat repro.sh
>   ip netns del A1 2> /dev/null
>   ip netns del A2 2> /dev/null
>   ip netns add A1
>   ip netns add A2
> 
>   ip -n A1 link add _v0 type veth peer name _v1 netns A2
>   ip -n A1 link set _v0 up
> 
>   ip -n A2 link add e00000 type bond
>   ip -n A2 link add lo0 type dummy
>   ip -n A2 link add vrf_1 type vrf table 10001
>   ip -n A2 link set vrf_1 up
>   ip -n A2 link set e00000 master vrf_1
> 
>   ip -n A2 addr add 1.1.1.1/24 dev e00000
>   ip -n A2 link set e00000 up
>   ip -n A2 link set _v1 master e00000
>   ip -n A2 link set _v1 up
>   ip -n A2 link set lo0 up
>   ip -n A2 addr add 2.2.2.2/32 dev lo0
> 
>   ip -n A2 neigh add 1.1.1.10 lladdr 77:77:77:77:77:77 dev e00000
>   ip -n A2 route add 3.3.3.3/32 via 1.1.1.10 dev e00000 table 10001
> 
>   ip netns exec A2 iptables -t nat -A POSTROUTING -p udp -m udp --dport 53 -j \
> 	SNAT --to-source 2.2.2.2 -o vrf_1
> 
>   sleep 5
>   ip netns exec A2 python3 run_in_A2.py > x &
>   XPID=$!
>   sleep 5
> 
>   IFACE_MAC=`sed -n 1p x`
>   SOCKET_PORT=`sed -n 2p x`
>   V1_MAC=`ip -n A2 link show _v1 | sed -n 2p | awk '{print $2'}`
>   ip netns exec A1 python3 run_in_A1.py -iface_mac ${IFACE_MAC} -socket_port \
>           ${SOCKET_PORT} -v1_mac ${SOCKET_PORT}
>   sleep 5
> 
>   kill -9 $XPID
>   wait $XPID 2> /dev/null
>   ip netns del A1
>   ip netns del A2
>   tail x -n 2
>   rm x
>   set +x
> 
> Signed-off-by: Lahav Schlesinger <lschlesinger@...venets.com>
> ---
>  drivers/net/vrf.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 

Thanks for the detailed explanation and use case.

Looks correct to me.
Reviewed-by: David Ahern <dsahern@...nel.org>



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ