linux-kernel - Re: [PATCH v2 1/2] neigh: fix possible DoS due to net iface start/stop loop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220815094432.tdqdfh3pwcfekegg@wittgenstein>
Date:   Mon, 15 Aug 2022 11:44:32 +0200
From:   Christian Brauner <brauner@...nel.org>
To:     Alexander Mikhalitsyn <alexander.mikhalitsyn@...tuozzo.com>
Cc:     netdev@...r.kernel.org, "Denis V. Lunev" <den@...nvz.org>,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        David Ahern <dsahern@...nel.org>,
        Yajun Deng <yajun.deng@...ux.dev>,
        Roopa Prabhu <roopa@...dia.com>, linux-kernel@...r.kernel.org,
        Alexey Kuznetsov <kuznet@....inr.ac.ru>,
        Konstantin Khorenko <khorenko@...tuozzo.com>,
        kernel@...nvz.org, devel@...nvz.org
Subject: Re: [PATCH v2 1/2] neigh: fix possible DoS due to net iface
 start/stop loop

On Wed, Aug 10, 2022 at 07:08:39PM +0300, Alexander Mikhalitsyn wrote:
> From: "Denis V. Lunev" <den@...nvz.org>
> 
> Normal processing of ARP request (usually this is Ethernet broadcast
> packet) coming to the host is looking like the following:
> * the packet comes to arp_process() call and is passed through routing
>   procedure
> * the request is put into the queue using pneigh_enqueue() if
>   corresponding ARP record is not local (common case for container
>   records on the host)
> * the request is processed by timer (within 80 jiffies by default) and
>   ARP reply is sent from the same arp_process() using
>   NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside
>   pneigh_enqueue())
> 
> And here the problem comes. Linux kernel calls pneigh_queue_purge()
> which destroys the whole queue of ARP requests on ANY network interface
> start/stop event through __neigh_ifdown().
> 
> This is actually not a problem within the original world as network
> interface start/stop was accessible to the host 'root' only, which
> could do more destructive things. But the world is changed and there
> are Linux containers available. Here container 'root' has an access
> to this API and could be considered as untrusted user in the hosting
> (container's) world.
> 
> Thus there is an attack vector to other containers on node when
> container's root will endlessly start/stop interfaces. We have observed
> similar situation on a real production node when docker container was
> doing such activity and thus other containers on the node become not
> accessible.
> 
> The patch proposed doing very simple thing. It drops only packets from
> the same namespace in the pneigh_queue_purge() where network interface
> state change is detected. This is enough to prevent the problem for the
> whole node preserving original semantics of the code.

This is how I'd do it as well.

> 
> v2:
> 	- do del_timer_sync() if queue is empty after pneigh_queue_purge()
> 
> Cc: "David S. Miller" <davem@...emloft.net>
> Cc: Eric Dumazet <edumazet@...gle.com>
> Cc: Jakub Kicinski <kuba@...nel.org>
> Cc: Paolo Abeni <pabeni@...hat.com>
> Cc: Daniel Borkmann <daniel@...earbox.net>
> Cc: David Ahern <dsahern@...nel.org>
> Cc: Yajun Deng <yajun.deng@...ux.dev>
> Cc: Roopa Prabhu <roopa@...dia.com>
> Cc: Christian Brauner <brauner@...nel.org>
> Cc: netdev@...r.kernel.org
> Cc: linux-kernel@...r.kernel.org
> Cc: Alexey Kuznetsov <kuznet@....inr.ac.ru>
> Cc: Alexander Mikhalitsyn <alexander.mikhalitsyn@...tuozzo.com>
> Cc: Konstantin Khorenko <khorenko@...tuozzo.com>
> Cc: kernel@...nvz.org
> Cc: devel@...nvz.org
> Investigated-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@...tuozzo.com>
> Signed-off-by: Denis V. Lunev <den@...nvz.org>
> ---
>  net/core/neighbour.c | 25 +++++++++++++++++--------
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index 54625287ee5b..19d99d1eff53 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -307,14 +307,23 @@ static int neigh_del_timer(struct neighbour *n)
>  	return 0;
>  }
>  
> -static void pneigh_queue_purge(struct sk_buff_head *list)
> +static void pneigh_queue_purge(struct sk_buff_head *list, struct net *net)
>  {
> +	unsigned long flags;
>  	struct sk_buff *skb;
>  
> -	while ((skb = skb_dequeue(list)) != NULL) {
> -		dev_put(skb->dev);
> -		kfree_skb(skb);
> +	spin_lock_irqsave(&list->lock, flags);

I'm a bit surprised to see a spinlock held around a while loop walking a
linked list but that seems to be quite common in this file. I take it
the lists are guaranteed to be short.

> +	skb = skb_peek(list);
> +	while (skb != NULL) {
> +		struct sk_buff *skb_next = skb_peek_next(skb, list);
> +		if (net == NULL || net_eq(dev_net(skb->dev), net)) {
> +			__skb_unlink(skb, list);
> +			dev_put(skb->dev);
> +			kfree_skb(skb);
> +		}
> +		skb = skb_next;
>  	}
> +	spin_unlock_irqrestore(&list->lock, flags);
>  }
>  
>  static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev,
> @@ -385,9 +394,9 @@ static int __neigh_ifdown(struct neigh_table *tbl, struct net_device *dev,
>  	write_lock_bh(&tbl->lock);
>  	neigh_flush_dev(tbl, dev, skip_perm);
>  	pneigh_ifdown_and_unlock(tbl, dev);
> -
> -	del_timer_sync(&tbl->proxy_timer);
> -	pneigh_queue_purge(&tbl->proxy_queue);
> +	pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev));
> +	if (skb_queue_empty_lockless(&tbl->proxy_queue))
> +		del_timer_sync(&tbl->proxy_timer);
>  	return 0;
>  }
>  
> @@ -1787,7 +1796,7 @@ int neigh_table_clear(int index, struct neigh_table *tbl)
>  	cancel_delayed_work_sync(&tbl->managed_work);
>  	cancel_delayed_work_sync(&tbl->gc_work);
>  	del_timer_sync(&tbl->proxy_timer);
> -	pneigh_queue_purge(&tbl->proxy_queue);
> +	pneigh_queue_purge(&tbl->proxy_queue, NULL);
>  	neigh_ifdown(tbl, NULL);
>  	if (atomic_read(&tbl->entries))
>  		pr_crit("neighbour leakage\n");
> -- 
> 2.36.1
>