[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161208215409.GA12472@amd>
Date: Thu, 8 Dec 2016 22:54:09 +0100
From: Pavel Machek <pavel@....cz>
To: Lino Sanfilippo <LinoSanfilippo@....de>
Cc: Francois Romieu <romieu@...zoreil.com>, bh74.an@...sung.com,
ks.giri@...sung.com, vipul.pandya@...sung.com,
peppe.cavallaro@...com, alexandre.torgue@...com,
davem@...emloft.net, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
On Thu 2016-12-08 21:32:12, Lino Sanfilippo wrote:
> Hi,
>
> On 08.12.2016 00:15, Francois Romieu wrote:
> > Lino Sanfilippo <LinoSanfilippo@....de> :
> >> The driver uses a private lock for synchronization between the xmit
> >> function and the xmit completion handler, but since the NETIF_F_LLTX flag
> >> is not set, the xmit function is also called with the xmit_lock held.
> >>
> >> On the other hand the xmit completion handler first takes the private lock
> >> and (in case that the tx queue has been stopped) the xmit_lock, leading
> >> to a reverse locking order and the potential danger of a deadlock.
> >
> > netif_tx_stop_queue is used by:
> > 1. xmit function before releasing lock and returning.
> > 2. sxgbe_restart_tx_queue()
> > <- sxgbe_tx_interrupt
> > <- sxgbe_reset_all_tx_queues()
> > <- sxgbe_tx_timeout()
> >
> > Given xmit won't be called again until tx queue is enabled, it's not clear
> > how a deadlock could happen due to #1.
> >
>
>
> After spending more thoughts on this I tend to agree with you. Yes, we have the
> different locking order for the xmit_lock and the private lock in two concurrent
> threads. And one of the first things one learns about locking is that this is a
> good way to create a deadlock sooner or later. But in our case the deadlock
> can only occur if the xmit function and the tx completion handler perceive different
> states for the tx queue, or to be more specific:
> the completion handler sees the tx queue in state "stopped" while the xmit handler
> sees it in state "running" at the same time. Only then both functions would try to
> take both locks, which could lead to a deadlock.
>
> OTOH Pavel said that he actually could produce a deadlock. Now I wonder if this is caused
> by that locking scheme (in a way I have not figured out yet) or if it is a different issue.
Pavel has some problems, but that's on different hardware.. and it is
possible that it is deadlock (or something else) somewhere else.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)
Powered by blists - more mailing lists