[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.21.1905042115250.6546@ja.home.ssi.bg>
Date:   Sat, 4 May 2019 23:13:45 +0300 (EEST)
From:   Julian Anastasov <ja@....bg>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
cc:     "David S. Miller" <davem@...emloft.net>,
        David Ahern <dsahern@...il.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        syzbot <syzbot+30209ea299c09d8785c9@...kaller.appspotmail.com>,
        ddstreet@...e.org, dvyukov@...gle.com,
        linux-kernel <linux-kernel@...r.kernel.org>,
        netdev@...r.kernel.org, syzkaller-bugs@...glegroups.com,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH] ipv4: Delete uncached routes upon unregistration of
 loopback device.
	Hello,
On Sat, 4 May 2019, Tetsuo Handa wrote:
> syzbot is hitting infinite loop when a loopback device in a namespace is
> unregistered [1]. This is because rt_flush_dev() is moving the refcount of
> "any device to unregister" to "a loopback device in that namespace" but
> nobody can drop the refcount moved from non loopback devices when the
> loopback device in that namespace is unregistered.
> 
> This behavior was introduced by commit caacf05e5ad1abf0 ("ipv4: Properly
> purge netdev references on uncached routes.") but there is no description
> why we have to temporarily move the refcount to "a loopback device in that
> namespace" and why it is safe to do so, for rt_flush_dev() becomes a no-op
> when "a loopback device in that namespace" is about to be unregistered.
> 
> Since I don't know the reason, this patch breaks the infinite loop by
> deleting the uncached route (which eventually drops the refcount via
> dst_destroy()) when "a loopback device in that namespace" is unregistered
> rather than when "non-loopback devices in that namespace" is unregistered.
	There is one simple rule: code that holds device references should
catch device events and to put the references. This should happen
not after the NETDEV_UNREGISTER event.
	But there are users such as dsts that have longer life because
there are other objects that can hold dsts - sockets, for example.
Sockets can hold dst as result of route resolving (sk_dst_cache) and to 
reuse it as long as it is valid. And many sockets can point to same dst.
Obviously, socket can be idle while device disappears. We do not
propagate device events to every socket. What we do is to mark the dst
entry as invalid and to drop its dev reference. So, the problem can be
noticed on next sending when the cached dst is revalidated. As the dst 
entry is marked as obsolete, it will be dropped.
	What you see in rt_flush_dev() is that the IPv4 route subsystem
drops its dev references at the right time. Why net->loopback_dev ?
Because we prefer the leaked dsts to prevent netns to be released, so
that we have more info where is the problem. If we account the leaks
to init netns, nobody will notice them. These are leaks that missed
many cleanup steps: device events and even net-exit events. If you
see "lo" in error messages, it is more likely a dst leak, i.e.
we got a dst reference but dst_release() was not called before netns
cleanup. In such case you need to track not the dev references but the
dst references. If another device is shown, it is not a dst leak
but some dev leak (dev_put not called).
> [1] https://syzkaller.appspot.com/bug?id=bae9a2236bfede42cf3d219e6bf6740c583568a4
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
> Reported-by: syzbot <syzbot+30209ea299c09d8785c9@...kaller.appspotmail.com>
> ---
>  net/ipv4/route.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 6fdf1c195d8e..7e865c11d4f3 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -1522,15 +1522,21 @@ void rt_flush_dev(struct net_device *dev)
>  {
>  	struct net *net = dev_net(dev);
>  	struct rtable *rt;
> +	struct rtable *tmp;
>  	int cpu;
>  
>  	for_each_possible_cpu(cpu) {
>  		struct uncached_list *ul = &per_cpu(rt_uncached_list, cpu);
>  
>  		spin_lock_bh(&ul->lock);
> -		list_for_each_entry(rt, &ul->head, rt_uncached) {
> +		list_for_each_entry_safe(rt, tmp, &ul->head, rt_uncached) {
>  			if (rt->dst.dev != dev)
>  				continue;
> +			if (dev == net->loopback_dev) {
> +				list_del_init(&rt->rt_uncached);
> +				ip_rt_put(rt);
> +				continue;
> +			}
>  			rt->dst.dev = net->loopback_dev;
>  			dev_hold(rt->dst.dev);
>  			dev_put(dev);
> -- 
> 2.17.1
Regards
--
Julian Anastasov <ja@....bg>
Powered by blists - more mailing lists
 
