[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <285f536d-17dd-86fa-d8cd-08c3d73f60e2@qtmlabs.xyz>
Date: Sat, 30 Oct 2021 07:25:15 +0700
From: msizanoen <msizanoen@...labs.xyz>
To: David Ahern <dsahern@...il.com>, davem@...emloft.net,
yoshfuji@...ux-ipv6.org, dsahern@...nel.org, kuba@...nel.org
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: Kernel leaks memory in ip6_dst_cache when suppress_prefix is
present in ipv6 routing rules and a `fib` rule is present in ipv6 nftables
rules
> exact command? I have not played with nftables.
sudo nft create table inet test
sudo nft create chain inet test test_chain '{ type filter hook
prerouting priority filter + 10; policy accept; }'
sudo nft add rule inet test test_chain meta nfproto ipv6 fib saddr .
mark . iif oif missing drop
> Do you have a stack
> trace of where the dst reference is getting taken?
ip6_dst_alloc+5
ip6_create_rt_rcu+107
ip6_pol_route_lookup+741
fib6_rule_action+707
fib_rules_lookup+342
fib6_rule_lookup+150
nft_fib6_eval+354
nft_do_chain+339
nft_do_chain_inet+123
nf_hook_slow+63
nf_hook_slow_list+129
ip6_sublist_rcv+606
ipv6_list_rcv+296
__netif_receive_skb_list_core+489
netif_receive_skb_list_internal+433
napi_complete_done+111
virtnet_poll+771
__napi_poll+42
net_rx_action+547
__softirqentry_text_start+208
__irq_exit_rcu+199
common_interrupt+131
asm_common_interrupt+30
native_safe_halt+11
default_idle+10
default_idle_call+53
do_idle+487
cpu_startup_entry+25
secondary_startup_64_no_verify+194
Collected using the following bpftrace script:
kretfunc:ip6_dst_alloc { @[(uint64)retval] = kstack(); }
kfunc:ip6_dst_destroy { delete(@[(uint64)args->dst]); }
On 10/30/21 06:53, David Ahern wrote:
> On 10/26/21 8:24 AM, msizanoen wrote:
>> The kernel leaks memory when a `fib` rule is present in ipv6 nftables
>> firewall rules and a suppress_prefix rule
>> is present in the IPv6 routing rules (used by certain tools such as
>> wg-quick). In such scenarios, every incoming
>> packet will leak an allocation in ip6_dst_cache slab cache.
>>
>> After some hours of `bpftrace`-ing and source code reading, I tracked
>> down the issue to this commit:
>> https://github.com/torvalds/linux/commit/ca7a03c4175366a92cee0ccc4fec0038c3266e26
>>
>>
>> The problem with that patch is that the generic args->flags always have
>> FIB_LOOKUP_NOREF set[1][2] but the
>> ip6-specific flag RT6_LOOKUP_F_DST_NOREF might not be specified, leading
>> to fib6_rule_suppress not
>> decreasing the refcount when needed. This can be fixed by exposing the
>> protocol-specific flags to the
>> protocol specific `suppress` function, and check the protocol-specific
>> `flags` argument for
>> RT6_LOOKUP_F_DST_NOREF instead of the generic FIB_LOOKUP_NOREF when
>> decreasing the refcount.
>>
>> How to reproduce:
>> - Add the following nftables rule to a prerouting chain: `meta nfproto
>> ipv6 fib saddr . mark . iif oif missing drop`
> exact command? I have not played with nftables. Do you have a stack
> trace of where the dst reference is getting taken?
>
>
>> - Run `sudo ip -6 rule add table main suppress_prefixlength 0`
>> - Watch `sudo slabtop -o | grep ip6_dst_cache` memory usage increase
>> with every incoming ipv6 packet
>>
>> Example
>> patch:https://gist.github.com/msizanoen1/36a2853467a9bd34fadc5bb3783fde0f
>>
>> [1]:https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L71
>>
>> [2]:https://github.com/torvalds/linux/blob/ca7a03c4175366a92cee0ccc4fec0038c3266e26/net/ipv6/fib6_rules.c#L99
>>
>>
>>
Powered by blists - more mailing lists