[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2024101625-fetal-oboe-1b9a@gregkh>
Date: Wed, 16 Oct 2024 10:27:39 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Hubert Wiśniewski <hubert.wisniewski.25632@...il.com>
Cc: Bin Liu <b-liu@...com>, linux-usb@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] usb: musb: Fix hardware lockup on first Rx endpoint
request
On Sat, Oct 05, 2024 at 03:19:10PM +0200, Hubert Wiśniewski wrote:
> There is a possibility that a request's callback could be invoked from
> usb_ep_queue() (call trace below, supplemented with missing calls):
>
> req->complete from usb_gadget_giveback_request
> (drivers/usb/gadget/udc/core.c:999)
> usb_gadget_giveback_request from musb_g_giveback
> (drivers/usb/musb/musb_gadget.c:147)
> musb_g_giveback from rxstate
> (drivers/usb/musb/musb_gadget.c:784)
> rxstate from musb_ep_restart
> (drivers/usb/musb/musb_gadget.c:1169)
> musb_ep_restart from musb_ep_restart_resume_work
> (drivers/usb/musb/musb_gadget.c:1176)
> musb_ep_restart_resume_work from musb_queue_resume_work
> (drivers/usb/musb/musb_core.c:2279)
> musb_queue_resume_work from musb_gadget_queue
> (drivers/usb/musb/musb_gadget.c:1241)
> musb_gadget_queue from usb_ep_queue
> (drivers/usb/gadget/udc/core.c:300)
>
> According to the docstring of usb_ep_queue(), this should not happen:
>
> "Note that @req's ->complete() callback must never be called from within
> usb_ep_queue() as that can create deadlock situations."
>
> In fact, a hardware lockup might occur in the following sequence:
>
> 1. The gadget is initialized using musb_gadget_enable().
> 2. Meanwhile, a packet arrives, and the RXPKTRDY flag is set, raising an
> interrupt.
> 3. If IRQs are enabled, the interrupt is handled, but musb_g_rx() finds an
> empty queue (next_request() returns NULL). The interrupt flag has
> already been cleared by the glue layer handler, but the RXPKTRDY flag
> remains set.
> 4. The first request is enqueued using usb_ep_queue(), leading to the call
> of req->complete(), as shown in the call trace above.
> 5. If the callback enables IRQs and another packet is waiting, step (3)
> repeats. The request queue is empty because usb_g_giveback() removes the
> request before invoking the callback.
> 6. The endpoint remains locked up, as the interrupt triggered by hardware
> setting the RXPKTRDY flag has been handled, but the flag itself remains
> set.
>
> For this scenario to occur, it is only necessary for IRQs to be enabled at
> some point during the complete callback. This happens with the USB Ethernet
> gadget, whose rx_complete() callback calls netif_rx(). If called in the
> task context, netif_rx() disables the bottom halves (BHs). When the BHs are
> re-enabled, IRQs are also enabled to allow soft IRQs to be processed. The
> gadget itself is initialized at module load (or at boot if built-in), but
> the first request is enqueued when the network interface is brought up,
> triggering rx_complete() in the task context via ioctl(). If a packet
> arrives while the interface is down, it can prevent the interface from
> receiving any further packets from the USB host.
>
> The situation is quite complicated with many parties involved. This
> particular issue can be resolved in several possible ways:
>
> 1. Ensure that callbacks never enable IRQs. This would be difficult to
> enforce, as discovering how netif_rx() interacts with interrupts was
> already quite challenging and u_ether is not the only function driver.
> Similar "bugs" could be hidden in other drivers as well.
> 2. Disable MUSB interrupts in musb_g_giveback() before calling the callback
> and re-enable them afterwars (by calling musb_{dis,en}able_interrupts(),
> for example). This would ensure that MUSB interrupts are not handled
> during the callback, even if IRQs are enabled. In fact, it would allow
> IRQs to be enabled when releasing the lock. However, this feels like an
> inelegant hack.
> 3. Modify the interrupt handler to clear the RXPKTRDY flag if the request
> queue is empty. While this approach also feels like a hack, it wastes
> CPU time by attempting to handle incoming packets when the software is
> not ready to process them.
> 4. Flush the Rx FIFO instead of calling rxstate() in musb_ep_restart().
> This ensures that the hardware can receive packets when there is at
> least one request in the queue. Once IRQs are enabled, the interrupt
> handler will be able to correctly process the next incoming packet
> (eventually calling rxstate()). This approach may cause one or two
> packets to be dropped (two if double buffering is enabled), but this
> seems to be a minor issue, as packet loss can occur when the software is
> not yet ready to process them. Additionally, this solution makes the
> gadget driver compliant with the rule mentioned in the docstring of
> usb_ep_queue().
>
> There may be additional solutions, but from these four, the last one has
> been chosen as it seems to be the most appropriate, as it addresses the
> "bad" behavior of the driver.
>
> Signed-off-by: Hubert Wiśniewski <hubert.wisniewski.25632@...il.com>
> ---
> drivers/usb/musb/musb_gadget.c | 13 ++++++++++---
> 1 file changed, 10 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/usb/musb/musb_gadget.c b/drivers/usb/musb/musb_gadget.c
> index bdf13911a1e5..c6076df0d50c 100644
> --- a/drivers/usb/musb/musb_gadget.c
> +++ b/drivers/usb/musb/musb_gadget.c
> @@ -1161,12 +1161,19 @@ void musb_free_request(struct usb_ep *ep, struct usb_request *req)
> */
> void musb_ep_restart(struct musb *musb, struct musb_request *req)
> {
> + u16 csr;
> + void __iomem *epio = req->ep->hw_ep->regs;
> +
> trace_musb_req_start(req);
> musb_ep_select(musb->mregs, req->epnum);
> - if (req->tx)
> + if (req->tx) {
> txstate(musb, req);
> - else
> - rxstate(musb, req);
> + } else {
> + csr = musb_readw(epio, MUSB_RXCSR);
> + csr |= MUSB_RXCSR_FLUSHFIFO | MUSB_RXCSR_P_WZC_BITS;
> + musb_writew(epio, MUSB_RXCSR, csr);
> + musb_writew(epio, MUSB_RXCSR, csr);
> + }
> }
>
> static int musb_ep_restart_resume_work(struct musb *musb, void *data)
>
> base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
> --
> 2.30.2
>
>
What commit id does this fix?
thanks,
greg k-h
Powered by blists - more mailing lists