[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240927141200.xMZ53xm5@linutronix.de>
Date: Fri, 27 Sep 2024 16:12:00 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Hubert Wiśniewski <hubert.wisniewski.25632@...il.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Ferry Toth <ftoth@...londelft.nl>,
Hardik Gajjar <hgajjar@...adit-jv.com>, Kees Cook <kees@...nel.org>,
Justin Stitt <justinstitt@...gle.com>,
Richard Acayan <mailingradian@...il.com>,
Jeff Johnson <quic_jjohnson@...cinc.com>,
"Ricardo B. Marliere" <ricardo@...liere.net>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Toke Høiland-Jørgensen <toke@...hat.com>,
linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] usb: gadget: u_ether: Use __netif_rx() in
rx_callback()
On 2024-09-27 15:33:35 [+0200], Hubert Wiśniewski wrote:
> On Thu, 2024-09-26 at 21:39 +0200, Hubert Wiśniewski wrote:
> > I'm a bit at loss here. The deadlock seems to be unrelated to netif_rx()
> > (which is not being called in the interrupt context after all), yet
> > replacing it with __netif_rx() fixes the lockup (though a warning is still
> > generated, which suggests that the patch does not completely fix the
> > issue).
>
> Well, never mind. After some investigation, I think the problem is as
> follows:
>
> 1. musb_g_giveback() releases the musb lock using spin_unlock(). The lock
> is now released, but hardirqs are still disabled.
>
> 2. Then, usb_gadget_giveback_request() is called, which in turn calls
> rx_complete(). This does not happen in the interrupt context, so netif_rx()
> disables bottom havles, then enables them using local_bh_enable().
>
> 3. This leads to calling __local_bh_enable_ip(), which gives off a warning
> (the first backtrace) that hardirqs are disabled. Then, hardirqs are
> disabled (again?), and then enabled (as they should have been in the first
> place).
>
> 4. After usb_gadget_giveback_request() returns, musb_g_giveback() acquires
> the musb lock using spin_lock(). This does not disable hardirqs, so they
> are still enabled.
>
> 5. While the musb lock is acquired, an interrupt occurs. It is handled by
> dsps_interrupt(), which acquires the musb lock. A deadlock occurs.
This all makes sense so far.
> Replacing netif_rx() with __netif_rx() apparently fixes this part, as it
> does not lead to any change of hardirq state. There is still one problem
> though: rx_complete() is usually called from the interrupt context, except
> when the network interface is brought up.
__netif_rx() has an assert which should complain if you use
__netif_rx(). Further in this case you pass the skb to backlog but never
kick it for processing. Which means it is delayed until a random
interrupt notices and processes it.
> I think one solution would be to make musb_g_giveback() use
> spin_unlock_irqrestore() and spin_lock_irqsave(), but I would need to pass
> the flags to it somehow. Also, I am not sure how that would influence other
> drivers using musb.
I would also suggest to do this since the other solution is not safe/
correct. There is the ->busy assignment which should cover for the most
cases. If you drop the lock without enabling interrupts then the
interrupt can't do anything to the EP and other enqueue/ dequeue
invocation is not possible if run on UP. On the other hand am335x was
used on PREEMPT_RT and it runs a UP machine into SMP so that should be
covered :)
While looking at it, dequeue/ enqueue during complete callback looks
safe due to the busy flag.
Sebastian
Powered by blists - more mailing lists