[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <de54e925-9536-f2cc-7b89-7205b3fb2c18@gameservers.com>
Date: Fri, 11 Jan 2019 13:19:54 -0500
From: Brian Rak <brak@...eservers.com>
To: netdev@...r.kernel.org
Subject: Re: IPv6 neighbor discovery issues on 4.18 (and now 4.19)
On 1/9/2019 4:33 PM, Brian Rak wrote:
>
> On 8/31/2018 10:49 AM, Brian Rak wrote:
>> We've upgraded a few machines to a 4.18.3 kernel and we're running
>> into weird IPv6 neighbor discovery issues. Basically, the machines
>> stop responding to inbound IPv6 neighbor solicitation requests, which
>> very quickly breaks all IPv6 connectivity.
>>
>> It seems like the routing table gets confused:
>>
>> # ip -6 route get fe80::4e16:fc00:c7a0:7800 dev br0
>> RTNETLINK answers: Network is unreachable
>> # ping6 fe80::4e16:fc00:c7a0:7800 -I br0
>> connect: Network is unreachable
>> yet
>>
>> # ip -6 route | grep fe80 | grep br0
>> fe80::/64 dev br0 proto kernel metric 256 pref medium
>>
>> fe80::4e16:fc00:c7a0:7800 is the link-local IP of the server's
>> default gateway.
>>
>> In this case, br0 has a single adapter attached to it.
>>
>> I haven't been able to come up with any sort of reproduction steps
>> here, this seems to happen after a few days of uptime in our
>> environment. The last known good release we have here is 4.17.13.
>>
>> Any suggestions for troubleshooting this? Sometimes we see machines
>> fix themselves, but we haven't been able to figure out what's
>> happening that helps.
>>
> So, we're still seeing this on 4.19.13. I've been investigating this
> a little further and have discovered a few more things:
>
> The server also fails to respond to IPv6 neighbor discovery requests:
>
> 16:12:10.181769 IP6 fe80::629c:9fff:fe22:4b80 > ff02::1:ff00:33:
> ICMP6, neighbor solicitation, who has 2001:x::33, length 32
>
> But this IP is configured properly:
>
> # ip -6 addr show dev br0
> 7: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
> inet6 2001:x::33/64 scope global
> valid_lft forever preferred_lft forever
> inet6 fe80::ec4:7aff:fe88:c48c/64 scope link
> valid_lft forever preferred_lft forever
>
> I found some instructions that suggest using `perf` to determine where
> packets are getting dropped, so I tried: perf record -g -a -e
> skb:kfree_skb; perf script, which showed me this seemingly relevant
> places (and a bunch of other drops):
>
> swapper 0 [037] 161501.062542: skb:kfree_skb:
> skbaddr=0xffff968771988600 protocol=34525 location=0xffffffff94796c6a
> ffffffff9468d50b kfree_skb+0x7b ([kernel.kallsyms])
> ffffffff94796c6a ndisc_send_skb+0x2fa ([kernel.kallsyms])
> ffffffff947975b4 ndisc_send_na+0x184 ([kernel.kallsyms])
> ffffffff94798143 ndisc_recv_ns+0x2f3 ([kernel.kallsyms])
> ffffffff94799b46 ndisc_rcv+0xe6 ([kernel.kallsyms])
> ffffffff947a1fa8 icmpv6_rcv+0x428 ([kernel.kallsyms])
> ffffffff9477bcd3 ip6_input_finish+0xf3 ([kernel.kallsyms])
> ffffffff9477c11f ip6_input+0x3f ([kernel.kallsyms])
> ffffffff9477c787 ip6_mc_input+0x97 ([kernel.kallsyms])
> ffffffff9477c0cc ip6_rcv_finish+0x7c ([kernel.kallsyms])
> ffffffff947d9fd2 ip_sabotage_in+0x42 ([kernel.kallsyms])
> ffffffff946f3822 nf_hook_slow+0x42 ([kernel.kallsyms])
> ffffffff9477c569 ipv6_rcv+0xc9 ([kernel.kallsyms])
> ffffffff946a5de7 __netif_receive_skb_one_core+0x57
> ([kernel.kallsyms])
> ffffffff946a5e48 __netif_receive_skb+0x18 ([kernel.kallsyms])
> ffffffff946a5145 netif_receive_skb_internal+0x45
> ([kernel.kallsyms])
> ffffffff946a520c netif_receive_skb+0x1c ([kernel.kallsyms])
> ffffffff947c7d03 br_netif_receive_skb+0x43 ([kernel.kallsyms])
> ffffffff947c7ded br_pass_frame_up+0xcd ([kernel.kallsyms])
> ffffffff947c80ca br_handle_frame_finish+0x24a ([kernel.kallsyms])
> ffffffff947dae0f br_nf_hook_thresh+0xdf ([kernel.kallsyms])
> ffffffff947dbf19 br_nf_pre_routing_finish_ipv6+0x109
> ([kernel.kallsyms])
> ffffffff947dc39a br_nf_pre_routing_ipv6+0xfa ([kernel.kallsyms])
> ffffffff947dbbe9 br_nf_pre_routing+0x1c9 ([kernel.kallsyms])
> ffffffff946f3822 nf_hook_slow+0x42 ([kernel.kallsyms])
> ffffffff947c850f br_handle_frame+0x1ef ([kernel.kallsyms])
> ffffffff946a5471 __netif_receive_skb_core+0x211
> ([kernel.kallsyms])
> ffffffff946a5dcb __netif_receive_skb_one_core+0x3b
> ([kernel.kallsyms])
> ffffffff946a5e48 __netif_receive_skb+0x18 ([kernel.kallsyms])
> ffffffff946a5145 netif_receive_skb_internal+0x45
> ([kernel.kallsyms])
> ffffffff946a6fb0 napi_gro_receive+0xd0 ([kernel.kallsyms])
> ffffffffc05c319f ixgbe_clean_rx_irq+0x46f ([kernel.kallsyms])
> ffffffffc05c4610 ixgbe_poll+0x280 ([kernel.kallsyms])
> ffffffff946a6729 net_rx_action+0x289 ([kernel.kallsyms])
> ffffffff94c000d1 __softirqentry_text_start+0xd1
> ([kernel.kallsyms])
> ffffffff94075108 irq_exit+0xe8 ([kernel.kallsyms])
> ffffffff94a01a69 do_IRQ+0x59 ([kernel.kallsyms])
> ffffffff94a0098f ret_from_intr+0x0 ([kernel.kallsyms])
> ffffffff9464e01d cpuidle_enter_state+0xbd ([kernel.kallsyms])
> ffffffff9464e287 cpuidle_enter+0x17 ([kernel.kallsyms])
> ffffffff940a3cd3 call_cpuidle+0x23 ([kernel.kallsyms])
> ffffffff940a3f78 do_idle+0x1c8 ([kernel.kallsyms])
> ffffffff940a4203 cpu_startup_entry+0x73 ([kernel.kallsyms])
> ffffffff9403fade start_secondary+0x1ae ([kernel.kallsyms])
> ffffffff940000d4 secondary_startup_64+0xa4 ([kernel.kallsyms])
>
> However, I can't seem to determine why this is failing. It seems like
> the only way to hit kfree_skb within ndisc_send_skb would be if
> icmp6_dst_alloc fails?
So, I applied a dumb patch to log failures:
diff -baur linux-4.19.13/net/ipv6/ndisc.c
linux-4.19.13-dirty/net/ipv6/ndisc.c
--- linux-4.19.13/net/ipv6/ndisc.c 2018-12-29 07:37:59.000000000 -0500
+++ linux-4.19.13-dirty/net/ipv6/ndisc.c 2019-01-09
16:37:59.140042846 -0500
@@ -470,6 +470,7 @@
icmpv6_flow_init(sk, &fl6, type, saddr, daddr, oif);
dst = icmp6_dst_alloc(skb->dev, &fl6);
if (IS_ERR(dst)) {
+ net_warn_ratelimited("Dropping ndisc response due to
icmp6_dst_alloc failure: %d", PTR_ERR(dst));
kfree_skb(skb);
return;
}
Which ends up producing a bunch of this:
[73531.594663] ICMPv6: Dropping ndisc response due to icmp6_dst_alloc
failure: -12
[73532.361678] ICMPv6: Dropping ndisc response due to icmp6_dst_alloc
failure: -12
[73533.319860] ICMPv6: Dropping ndisc response due to icmp6_dst_alloc
failure: -12
[73534.089759] ICMPv6: Dropping ndisc response due to icmp6_dst_alloc
failure: -12
That seems to be ENOMEM, which suggests that dst_alloc is failing
somehow (as ip6_dst_alloc looks to be a simple wrapper around dst_alloc).
If I look at `trace-cmd record -p function -l ip6_dst_gc`, I see that
this function is getting called about once a second..
I have net.ipv6.route.max_size=4096, and the machine only has 376 routes
(calculated by `ip -6 route | wc -l`). However, raising this sysctl to
65k seems to instantly fix IPv6 (I'm not sure if this is a permanent fix
yet)
Does this indicate that the machine is leaking IPv6 dst_entry? How would
I determine what is leaking?
This is from shortly after raising the max_size:
# cat /proc/net/rt6_stats
02b9 015f 13e597 04ab 0000 1031 0b3c
Powered by blists - more mailing lists