lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 6 Aug 2013 21:13:47 -0700
From:	Stephen Hemminger <stephen@...workplumber.org>
To:	Cong Wang <amwang@...hat.com>
Cc:	netdev@...r.kernel.org
Subject: Re: A soft lockup in vxlan module

On Wed, 07 Aug 2013 09:23:54 +0800
Cong Wang <amwang@...hat.com> wrote:

> Hi, Stephen
> 
> You introduced a soft lockup in vxlan module in
> 
> commit fe5c3561e6f0ac7c9546209f01351113c1b77ec8
> Author: stephen hemminger <stephen@...workplumber.org>
> Date:   Sat Jul 13 10:18:18 2013 -0700
> 
>     vxlan: add necessary locking on device removal
> 
> The problem is that vxlan_dellink(), which is called with RTNL lock
> held, tries to flush the workqueue synchronously, but apparently
> igmp_join and igmp_leave work need to hold RTNL lock too, therefore we
> have a soft lockup! This is 100% reproducible on my 2.6.32 backport
> while running `modprobe -r vxlan`.
> 
> A quick but perhaps ugly fix is just releasing RTNL lock before calling
> flush_workqueue():
> 
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 8bf31d9..581d3d5 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -1837,7 +1837,9 @@ static void vxlan_dellink(struct net_device *dev,
> struct list_head *head)
>         struct vxlan_net *vn = net_generic(dev_net(dev), vxlan_net_id);
>         struct vxlan_dev *vxlan = netdev_priv(dev);
>  
> +       rtnl_unlock();
>         flush_workqueue(vxlan_wq);
> +       rtnl_lock();
>  
>         spin_lock(&vn->sock_lock);
>         hlist_del_rcu(&vxlan->hlist);
> 
> However, I think a better way is still what I did, that is, removing
> RTNL lock from ip_mc_join_group() and ip_mc_leave_group().
> 
> What do you think? Any other idea to fix it?
> 
> Thanks.
> 

Probably the flush_workqueue can just be removed and let the normal
refcounting work. The workqueue has a reference to device and socket,
therefore the cleanups should work correctly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ