lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241120181201.594aab6da28ec54d263c9177@uniroma2.it>
Date: Wed, 20 Nov 2024 18:12:01 +0100
From: Andrea Mayer <andrea.mayer@...roma2.it>
To: "Mortensen, Christian" <cworm@...mai.com>
Cc: "davem@...emloft.net" <davem@...emloft.net>,
        Stefano Salsano
 <stefano.salsano@...roma2.it>,
        Paolo Lungaroni
 <paolo.lungaroni@...roma2.it>,
        Ahmed Abdelsalam <ahabdels.dev@...il.com>,
        Andrea Mayer <andrea.mayer@...roma2.it>,
        "netdev@...r.kernel.org"
 <netdev@...r.kernel.org>
Subject: Re: Stackoverflow when using seg6 routes

Hi Christian,
please see below.

On Fri, 1 Nov 2024 16:28:54 +0000
"Mortensen, Christian" <cworm@...mai.com> wrote:

> Hi!
> 
> I can consistently reproduce a stack-overflow in the kernel when using seg6 routes. I was hit by the bug in a stock 5.15.0-119 Ubuntu kernel. I reproduced it in QEMU using a custom 6.11.3 kernel. I have not tried other kernels.
> 
> Here is output from the 6.11.3 kernel:
> 
> BUG: IRQ stack guard page was hit at (____ptrval____) (stack is (____ptrval____)..(____ptrval____))
> Oops: stack guard page: 0000 [#1] PREEMPT SMP PTI
> CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.11.3 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> RIP: 0010:fib_table_lookup+0x25/0x640
> Code: 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 57 49 89 f8 49 89 f1 41 56 41 55 41 54 55 53 48 83 ec 58 48 8b 6f 28 44 8b 76 2c <89> 0c 24 48 8b 5d 08 48 85 db 0f 84 82 00 00 00 49 89 d2 45 31 e4
> RSP: 0018:ffffa81e000f4fe0 EFLAGS: 00010282
> RAX: ffff940dc2d9b600 RBX: ffff940dc1d7d9c0 RCX: 0000000000000001
> RDX: ffffa81e000f5128 RSI: ffffa81e000f5158 RDI: ffff940dc2d9b600
> RBP: ffff940dc2d9b630 R08: ffff940dc2d9b600 R09: ffffa81e000f5158
> R10: ffff940dc6882200 R11: ffff940dc3360000 R12: 00000000fffffff5
> R13: ffffa81e000f5158 R14: 000000000101a8c0 R15: ffff940dc793b080
> FS:  0000000000000000(0000) GS:ffff940e79c80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffa81e000f4fd8 CR3: 000000000ce2c000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <#DF>
>  ? die+0x37/0x90
>  ? handle_stack_overflow+0x4d/0x60
>  ? exc_double_fault+0xe3/0x150
>  ? asm_exc_double_fault+0x23/0x30
>  ? fib_table_lookup+0x25/0x640
>  </#DF>
>  <IRQ>
>  ? fib_table_lookup+0x223/0x640
>  fib4_rule_action+0x7c/0xa0
>  fib_rules_lookup+0x1db/0x260
>  __fib_lookup+0x5f/0x90
>  __fib_validate_source+0x2e0/0x410
>  ? fib4_rule_action+0x84/0xa0
>  ? fib_rules_lookup+0x106/0x260
>  fib_validate_source+0x55/0x110
>  ip_route_input_slow+0x69b/0xb60
>  ip_route_input_noref+0x79/0x80
>  input_action_end_dt4+0x8c/0x180
>  seg6_local_input_core+0x34/0x70
>  lwtunnel_input+0x62/0xb0
>  lwtunnel_input+0x62/0xb0
>  seg6_local_input_core+0x34/0x70
>  lwtunnel_input+0x62/0xb0
>  lwtunnel_input+0x62/0xb0
>  seg6_local_input_core+0x34/0x70
>  lwtunnel_input+0x62/0xb0
>  lwtunnel_input+0x62/0xb0
>  seg6_local_input_core+0x34/0x70
>  lwtunnel_input+0x62/0xb0
>  lwtunnel_input+0x62/0xb0
> 
> (MANY SIMILAR LINES OMITTED)
> 
>  seg6_local_input_core+0x34/0x70
>  lwtunnel_input+0x62/0xb0
>  lwtunnel_input+0x62/0xb0
>  seg6_local_input_core+0x34/0x70
>  lwtunnel_input+0x62/0xb0
>  lwtunnel_input+0x62/0xb0
>  seg6_local_input_core+0x34/0x70
>  lwtunnel_input+0x62/0xb0
>  lwtunnel_input+0x62/0xb0
>  __netif_receive_skb_one_core+0x6b/0x80
>  process_backlog+0x8a/0x130
>  __napi_poll+0x2c/0x1b0
>  net_rx_action+0x2e6/0x350
>  ? sched_balance_domains+0xe9/0x350
>  handle_softirqs+0xc4/0x290
>  irq_exit_rcu+0x67/0x90
>  sysvec_apic_timer_interrupt+0x75/0x90
>  </IRQ>
>  <TASK>
>  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> RIP: 0010:default_idle+0xf/0x20
> Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d c3 26 26 00 fb f4 <fa> c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
> RSP: 0018:ffffa81e000b3ef0 EFLAGS: 00000202
> RAX: ffff940e79c80000 RBX: 0000000000000001 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: ffffffffb5a2e8a9 RDI: 0000000000d4f884
> RBP: ffff940dc0384000 R08: 0000000000d4f884 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>  default_idle_call+0x30/0xf0
>  do_idle+0x1b1/0x1c0
>  cpu_startup_entry+0x29/0x30
>  start_secondary+0xf5/0x100
>  common_startup_64+0x13e/0x148
>  </TASK>
> Modules linked in: veth vrf
> ---[ end trace 0000000000000000 ]---
> RIP: 0010:fib_table_lookup+0x25/0x640
> Code: 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 41 57 49 89 f8 49 89 f1 41 56 41 55 41 54 55 53 48 83 ec 58 48 8b 6f 28 44 8b 76 2c <89> 0c 24 48 8b 5d 08 48 85 db 0f 84 82 00 00 00 49 89 d2 45 31 e4
> RSP: 0018:ffffa81e000f4fe0 EFLAGS: 00010282
> RAX: ffff940dc2d9b600 RBX: ffff940dc1d7d9c0 RCX: 0000000000000001
> RDX: ffffa81e000f5128 RSI: ffffa81e000f5158 RDI: ffff940dc2d9b600
> RBP: ffff940dc2d9b630 R08: ffff940dc2d9b600 R09: ffffa81e000f5158
> R10: ffff940dc6882200 R11: ffff940dc3360000 R12: 00000000fffffff5
> R13: ffffa81e000f5158 R14: 000000000101a8c0 R15: ffff940dc793b080
> FS:  0000000000000000(0000) GS:ffff940e79c80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: ffffa81e000f4fd8 CR3: 000000000ce2c000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Kernel panic - not syncing: Fatal exception in interrupt
> Kernel Offset: 0x33000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> 
> The following script consistently reproduces the problem for me. It is probably not minimal:
> 

Thank you for providing the commands and configuration to reproduce the
behavior.

> #!/bin/bash
> 
> # Make a network namespace
> ip netns delete network
> ip netns add network
> ip netns exec network ip link add br0 type bridge
> ip netns exec network ip link set br0 up
> 
> # Setup host1:
> ip netns delete host1
> ip netns add host1
> ip netns exec network ip link add host1 type veth peer frr0 netns host1
> ip netns exec host1 ip addr add dev frr0 fe80::1
> ip netns exec host1 ip link set dev frr0 address 00:00:01:00:00:01
> ip netns exec host1 ip link set frr0 up
> ip netns exec network ip link set host1 master br0
> ip netns exec network ip link set host1 up
> ip netns exec host1 ip l set dev lo up
> ip netns exec host1 sysctl net.ipv4.ip_forward=1
> ip netns exec host1 sysctl net.ipv4.conf.all.rp_filter=0
> ip netns exec host1 sysctl net.ipv6.conf.all.forwarding=1
> ip netns exec host1 sysctl net.ipv4.conf.default.log_martians=1
> ip netns exec host1 sysctl net.vrf.strict_mode=1
> ip netns exec host1 ip addr add dev lo fc00::1:6:0:0:1
> ip netns exec host1 ip link add vrf9 type vrf table 1009
> ip netns exec host1 ip link set vrf9 up

Now, consider the following instruction for decapsulating a tunnel based on the
SRv6 End.DT4 decap SID

> ip netns exec host1 ip r add fc00:0:0:1:7:: encap seg6local action End.DT4 vrftable 1009 dev vrf9 proto bgp metric 20 pref medium

This proposed configuration is pathological because the same End.DT4 decap SID,
fc00:0:0:1:7:: is used to identify a function residing in host1 and host2 (see
the equivalent decap instruction in host 2).

You must choose two different SIDs to distinguish the two instances of decap
behavior, for example you could assign fc00:0:0:1:7::d4 for the End.DT4
deployed on host1 and and fc00:0:0:2:7::d4 for the End.DT4 host2 respectively.

In particular according to the SRv6 architecture, a SID should provide both the
topological information (where to route the packet) and the service information
(what to do with the packet when you have reached the destination).
Breaking this model is highly likely to generate troubles.

Therefore, the specific SID for (destination + End.DT4) should be used in the
segment list of the following encapsulation instruction:

> ip netns exec host1 ip r add 192.168.2.1 encap seg6 mode encap segs fc00:0:0:1:7:: via inet6 fe80::2 dev frr0 vrf vrf9
> 

Following the above instruction, the host 1 will encapsulate all packets with
IPv4 destination 192.168.2.1 into SRv6 packets with destination address
fc00:0:0:1:7::. Note that the "via" and "dev" parameters provided in this
instruction are simply ignored.

The host 1 will use the destination address (fc00:0:0:1:7::) to route the packet
after the encapsulation operation. This is why it is fundamental that the SID
includes the topological information to reach the destination.

On the other hand, in the provided configuration, the same destination address
(fc00:0:0:1:7::) is associated with the decap (End.DT4) operation. Therefore, the
packet immediately after having been encapsulated by host 1, will be
decapsulated by host 1 itself, and the original IPv4 packet will be again
routed by host 1. Clearly, the IPv4 destination will match the destination
associated with the encapsulation instruction and the packet will be again
encapsulated.

You can see that a loop is created and the packet will continue to be
encapsulated and then decapsulated until the stack-overflow in the kernel will
happen.

Perhaps the Linux kernel should be able to detect and protect itself from this
kind of misconfiguration, but this will require a re-design of some kernel
components.

Ciao,
Andrea

> # Setup pseduo-vm on host1
> ip netns delete host1_1
> ip netns add host1_1
> ip netns exec host1 ip link add vm1 type veth peer eth0 netns host1_1
> ip netns exec host1 ip link set vm1 master vrf9
> ip netns exec host1 ip link set vm1 up
> ip netns exec host1_1 ip link set eth0 up
> ip netns exec host1 sysctl net.ipv4.conf.vm1.proxy_arp=1
> ip netns exec host1_1 ip addr add dev eth0 192.168.1.1/16
> ip netns exec host1 ip route add 192.168.1.1/32 dev vm1 vrf vrf9
> 
> # Setup host2
> ip netns delete host2
> ip netns add host2
> ip netns exec network ip link add host2 type veth peer frr0 netns host2
> ip netns exec host2 ip addr add dev frr0 fe80::2
> ip netns exec host2 ip link set dev frr0 address 00:00:01:00:00:02
> ip netns exec host2 ip link set frr0 up
> ip netns exec network ip link set host2 master br0
> ip netns exec network ip link set host2 up
> ip netns exec host2 ip l set dev lo up
> ip netns exec host2 sysctl net.ipv4.ip_forward=1
> ip netns exec host2 sysctl net.ipv4.conf.all.rp_filter=0
> ip netns exec host2 sysctl net.ipv6.conf.all.forwarding=1
> ip netns exec host2 sysctl net.ipv4.conf.default.log_martians=1
> ip netns exec host2 sysctl net.vrf.strict_mode=1
> ip netns exec host2 ip addr add dev lo fc00::2:6:0:0:1
> ip netns exec host2 ip link add vrf9 type vrf table 1009
> ip netns exec host2 ip link set vrf9 up
> ip netns exec host2 ip r add fc00:0:0:1:7:: encap seg6local action End.DT4 vrftable 1009 dev vrf9 proto bgp metric 20 pref medium
> ip netns exec host2 ip r add 192.168.1.1 encap seg6 mode encap segs fc00:0:0:1:7:: via inet6 fe80::1 dev frr0 vrf vrf9
> 
> # Setup pseduo-vm on host2:
> ip netns delete host2_1
> ip netns add host2_1
> ip netns exec host2 ip link add vm1 type veth peer eth0 netns host2_1
> ip netns exec host2 ip link set vm1 master vrf9
> ip netns exec host2 ip link set vm1 up
> ip netns exec host2_1 ip link set eth0 up
> ip netns exec host2 sysctl net.ipv4.conf.vm1.proxy_arp=1
> ip netns exec host2_1 ip addr add dev eth0 192.168.2.1/16
> ip netns exec host2 ip route add 192.168.2.1/32 dev vm1 vrf vrf9
> ip netns exec host1_1 ip a add 192.168.254.254 dev eth0
> 
> # Setup routes between host1 and host2:
> ip netns exec host1 ip -6 route add fc00:0:0:2::/64 dev frr0 nexthop via fe80::2
> ip netns exec host1 ip neigh add fe80::2 lladdr 00:00:01:00:00:02 dev frr0
> ip netns exec host2 ip -6 route add fc00:0:0:1::/64 dev frr0 nexthop via fe80::1
> ip netns exec host2 ip neigh add fe80::1 lladdr 00:00:01:00:00:01 dev frr0
> 
> # And ping
> ip netns exec host1_1 ping 192.168.2.1
> 
> 
> Best
> 
> Christian
> 
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ