lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 18 Oct 2018 12:59:35 -0400
From:   Neil Horman <nhorman@...driver.com>
To:     "Banerjee, Debabrata" <dbanerje@...mai.com>
Cc:     David Miller <davem@...emloft.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next] netpoll: allow cleanup to be synchronous

On Thu, Oct 18, 2018 at 03:17:06PM +0000, Banerjee, Debabrata wrote:
> > From: Neil Horman <nhorman@...driver.com>
> 
> > Agreed, this doesn't make sense.  If you want a synchronous cleanup, create
> > a wrapper function that creates a wait queue, calls __netpoll_free_async,
> > and blocks on the wait queue completion.  Modify the cleanup_work
> > method(s) to complete the wait queue, and you've got what you want.
> > 
> > Neil
> 
> Actually the patch looks bad, but it seems to turn out the rtnl is always
> held by the parent when it is called, and asynchronous cleanup doesn't
> seem to be necessary at all. Perhaps you could share your deadlock
> test case for 2cde6acd49da?
> 

That commit doesn't fix a deadlock, it fixes a modification of data that must
only be modified under the protection of the rtnl lock.  From the commit log:

__netpoll_rcu_free is used to free netpoll structures when the rtnl_lock is
    already held.  The mechanism is used to asynchronously call __netpoll_cleanup
    outside of the holding of the rtnl_lock, so as to avoid deadlock.
    Unfortunately, __netpoll_cleanup modifies pointers (dev->np), which means the
    rtnl_lock must be held while calling it.  Further, it cannot be held, because
    rcu callbacks may be issued in softirq contexts, which cannot sleep.

"The mechanism" is referring to the use of rcu callbacks, which origionally were
put in place to allow the freeing of data outside the holding of the rtnl, which
did avoid deadlock.

This patch fixes the fact that __netpoll_cleanup modifies the dev->np pointer
outside of the rtnl_lock, which is disallowed because readers of dev->np assume
it will be stable while rtnl is held.

As for the workqueue not being needed anymore, it definately used to be, but it
appears that most of the instances where a deadlock might occur have been
modified such that it is no longer the case.  The team driver still seems to be
an outlier there though I think , in that it doesn't guarantee the holding of
rtnl in its port add/delete paths.  

It might be worth cleaning that up and simply replacing all the calls to
__netpoll_free_async with direct calls to __netpoll_cleanup.  That would let you
eliminate the former function alltogether.

Neil

> Patch v2 coming next.
> 
> -Deb
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ