[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y0llmkQqmWLDLm52@lunn.ch>
Date: Fri, 14 Oct 2022 15:35:22 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Íñigo Huguet <ihuguet@...hat.com>
Cc: irusskikh@...vell.com, dbogdanov@...vell.com, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com,
netdev@...r.kernel.org, Li Liang <liali@...hat.com>
Subject: Re: [PATCH net] atlantic: fix deadlock at aq_nic_stop
On Fri, Oct 14, 2022 at 02:43:47PM +0200, Íñigo Huguet wrote:
> On Fri, Oct 14, 2022 at 2:14 PM Andrew Lunn <andrew@...n.ch> wrote:
> >
> > > Fix trying to acquire rtnl_lock at the beginning of those functions, and
> > > returning if NIC closing is ongoing. Also do the "linkstate" stuff in a
> > > workqueue instead than in a threaded irq, where sleeping or waiting a
> > > mutex for a long time is discouraged.
> >
> > What happens when the same interrupt fires again, while the work queue
> > is still active? The advantage of the threaded interrupt handler is
> > that the interrupt will be kept disabled, and should not fire again
> > until the threaded interrupt handler exits.
>
> Nothing happens, if it's already queued, it won't be queued again, and
> when it runs it will evaluate the last link state. And in the worst
> case, it will be enqueued to run again, and if linkstate has changed
> it will be evaluated again. This will rarely happen and it's harmless.
>
> Also, I haven't checked it but these lines suggest that the IRQ is
> auto-disabled in the hw until you enable it again. I didn't rely on
> this, anyway.
> self->aq_hw_ops->hw_irq_enable(self->aq_hw,
> BIT(self->aq_nic_cfg.link_irq_vec));
>
> Honestly I was a bit in doubt on doing this, with the threaded irq it
> would also work. I'd like to hear more opinions about this and I can
> change it back.
Ethernet PHYs do all there interrupt handling in threaded IRQs. That
can require a number of MDIO transactions. So we can be talking about
64 bits at 2.5MHz, so 25uS or more. We have not seen issues with that.
> > If MACSEC is enabled, aq_nic_update_link_status() is called with RTNL
> > held. If it is not enabled, RTNL is not held. This sort of
> > inconsistency could lead to further locking bugs, since it is not
> > obvious. Please try to make this consistent.
>
> This is not new in these patches, that's what was already happening, I
> just moved it to get the lock a bit earlier. In my opinion, this is as
> it should be: why acquire a mutex if you don't have anything to
> protect with it? And it's worse with rtnl_lock which is held by many
> processes, and can be held for quite long times...
Maybe the lock needs to be moved closer to what actually needs to be
protect? What is it protecting?
Andrew
Powered by blists - more mailing lists