linux-kernel - [PATCH 4.4 11/43] net_sched: close another race condition in tcf_mirred

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20170501212559.988601385@linuxfoundation.org>
Date:   Mon,  1 May 2017 14:27:11 -0700
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     linux-kernel@...r.kernel.org
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable@...r.kernel.org, Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        Julia Lawall <julia.lawall@...6.fr>
Subject: [PATCH 4.4 11/43] net_sched: close another race condition in tcf_mirred_release()

4.4-stable review patch.  If anyone has any objections, please let me know.

------------------

From: WANG Cong <xiyou.wangcong@...il.com>

commit dc327f8931cb9d66191f489eb9a852fc04530546 upstream.

We saw the following extra refcount release on veth device:

  kernel: [7957821.463992] unregister_netdevice: waiting for mesos50284 to become free. Usage count = -1

Since we heavily use mirred action to redirect packets to veth, I think
this is caused by the following race condition:

CPU0:
tcf_mirred_release(): (in RCU callback)
	struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1);

CPU1:
mirred_device_event():
        spin_lock_bh(&mirred_list_lock);
        list_for_each_entry(m, &mirred_list, tcfm_list) {
                if (rcu_access_pointer(m->tcfm_dev) == dev) {
                        dev_put(dev);
                        /* Note : no rcu grace period necessary, as
                         * net_device are already rcu protected.
                         */
                        RCU_INIT_POINTER(m->tcfm_dev, NULL);
                }
        }
        spin_unlock_bh(&mirred_list_lock);

CPU0:
tcf_mirred_release():
        spin_lock_bh(&mirred_list_lock);
        list_del(&m->tcfm_list);
        spin_unlock_bh(&mirred_list_lock);
        if (dev)               // <======== Stil refers to the old m->tcfm_dev
                dev_put(dev);  // <======== dev_put() is called on it again

The action init code path is good because it is impossible to modify
an action that is being removed.

So, fix this by moving everything under the spinlock.

Fixes: 2ee22a90c7af ("net_sched: act_mirred: remove spinlock in fast path")
Fixes: 6bd00b850635 ("act_mirred: fix a race condition on mirred_list")
Cc: Jamal Hadi Salim <jhs@...atatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@...il.com>
Acked-by: Jamal Hadi Salim <jhs@...atatu.com>
Signed-off-by: David S. Miller <davem@...emloft.net>
Cc: Julia Lawall <julia.lawall@...6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>

---
 net/sched/act_mirred.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -36,14 +36,15 @@ static DEFINE_SPINLOCK(mirred_list_lock)
 static void tcf_mirred_release(struct tc_action *a, int bind)
 {
 	struct tcf_mirred *m = to_mirred(a);
-	struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1);
+	struct net_device *dev;
 
 	/* We could be called either in a RCU callback or with RTNL lock held. */
 	spin_lock_bh(&mirred_list_lock);
 	list_del(&m->tcfm_list);
-	spin_unlock_bh(&mirred_list_lock);
+	dev = rcu_dereference_protected(m->tcfm_dev, 1);
 	if (dev)
 		dev_put(dev);
+	spin_unlock_bh(&mirred_list_lock);
 }
 
 static const struct nla_policy mirred_policy[TCA_MIRRED_MAX + 1] = {