[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20160517.142247.1572671234268001610.davem@davemloft.net>
Date: Tue, 17 May 2016 14:22:47 -0400 (EDT)
From: David Miller <davem@...emloft.net>
To: xiyou.wangcong@...il.com
Cc: netdev@...r.kernel.org, jhs@...atatu.com
Subject: Re: [Patch net] net_sched: close another race condition in
tcf_mirred_release()
From: Cong Wang <xiyou.wangcong@...il.com>
Date: Mon, 16 May 2016 15:11:18 -0700
> We saw the following extra refcount release on veth device:
>
> kernel: [7957821.463992] unregister_netdevice: waiting for mesos50284 to become free. Usage count = -1
>
> Since we heavily use mirred action to redirect packets to veth, I think
> this is caused by the following race condition:
>
> CPU0:
> tcf_mirred_release(): (in RCU callback)
> struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1);
>
> CPU1:
> mirred_device_event():
> spin_lock_bh(&mirred_list_lock);
> list_for_each_entry(m, &mirred_list, tcfm_list) {
> if (rcu_access_pointer(m->tcfm_dev) == dev) {
> dev_put(dev);
> /* Note : no rcu grace period necessary, as
> * net_device are already rcu protected.
> */
> RCU_INIT_POINTER(m->tcfm_dev, NULL);
> }
> }
> spin_unlock_bh(&mirred_list_lock);
>
> CPU0:
> tcf_mirred_release():
> spin_lock_bh(&mirred_list_lock);
> list_del(&m->tcfm_list);
> spin_unlock_bh(&mirred_list_lock);
> if (dev) // <======== Stil refers to the old m->tcfm_dev
> dev_put(dev); // <======== dev_put() is called on it again
>
> The action init code path is good because it is impossible to modify
> an action that is being removed.
>
> So, fix this by moving everything under the spinlock.
>
> Fixes: 2ee22a90c7af ("net_sched: act_mirred: remove spinlock in fast path")
> Fixes: 6bd00b850635 ("act_mirred: fix a race condition on mirred_list")
> Cc: Jamal Hadi Salim <jhs@...atatu.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@...il.com>
Applied.
Powered by blists - more mailing lists