[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181018165935.GC2601@hmswarspite.think-freely.org>
Date: Thu, 18 Oct 2018 12:59:35 -0400
From: Neil Horman <nhorman@...driver.com>
To: "Banerjee, Debabrata" <dbanerje@...mai.com>
Cc: David Miller <davem@...emloft.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next] netpoll: allow cleanup to be synchronous
On Thu, Oct 18, 2018 at 03:17:06PM +0000, Banerjee, Debabrata wrote:
> > From: Neil Horman <nhorman@...driver.com>
>
> > Agreed, this doesn't make sense. If you want a synchronous cleanup, create
> > a wrapper function that creates a wait queue, calls __netpoll_free_async,
> > and blocks on the wait queue completion. Modify the cleanup_work
> > method(s) to complete the wait queue, and you've got what you want.
> >
> > Neil
>
> Actually the patch looks bad, but it seems to turn out the rtnl is always
> held by the parent when it is called, and asynchronous cleanup doesn't
> seem to be necessary at all. Perhaps you could share your deadlock
> test case for 2cde6acd49da?
>
That commit doesn't fix a deadlock, it fixes a modification of data that must
only be modified under the protection of the rtnl lock. From the commit log:
__netpoll_rcu_free is used to free netpoll structures when the rtnl_lock is
already held. The mechanism is used to asynchronously call __netpoll_cleanup
outside of the holding of the rtnl_lock, so as to avoid deadlock.
Unfortunately, __netpoll_cleanup modifies pointers (dev->np), which means the
rtnl_lock must be held while calling it. Further, it cannot be held, because
rcu callbacks may be issued in softirq contexts, which cannot sleep.
"The mechanism" is referring to the use of rcu callbacks, which origionally were
put in place to allow the freeing of data outside the holding of the rtnl, which
did avoid deadlock.
This patch fixes the fact that __netpoll_cleanup modifies the dev->np pointer
outside of the rtnl_lock, which is disallowed because readers of dev->np assume
it will be stable while rtnl is held.
As for the workqueue not being needed anymore, it definately used to be, but it
appears that most of the instances where a deadlock might occur have been
modified such that it is no longer the case. The team driver still seems to be
an outlier there though I think , in that it doesn't guarantee the holding of
rtnl in its port add/delete paths.
It might be worth cleaning that up and simply replacing all the calls to
__netpoll_free_async with direct calls to __netpoll_cleanup. That would let you
eliminate the former function alltogether.
Neil
> Patch v2 coming next.
>
> -Deb
>
Powered by blists - more mailing lists