[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <438394b419955a430d884a175a6eda0a99ce4d08.camel@surriel.com>
Date: Wed, 20 May 2020 15:53:25 -0400
From: Rik van Riel <riel@...riel.com>
To: Mathias Nyman <mathias.nyman@...ux.intel.com>,
linux-usb <linux-usb@...r.kernel.org>
Cc: alsa-devel@...a-project.org,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Mathias Nyman <mathias.nyman@...el.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jaroslav Kysela <perex@...ex.cz>, Takashi Iwai <tiwai@...e.com>
Subject: Re: XHCI vs PCM2903B/PCM2904 part 2
On Wed, 2020-05-20 at 16:50 +0300, Mathias Nyman wrote:
> On 20.5.2020 14.26, Rik van Riel wrote:
> > After a few more weeks of digging, I have come to the tentative
> > conclusion that either the XHCI driver, or the USB sound driver,
> > or both, fail to handle USB errors correctly.
> >
> > I have some questions at the bottom, after a (brief-ish)
> > explanation
> > of exactly what seems to go wrong.
> >
> > TL;DR: arecord from a misbehaving device can hang forever
> > after a USB error, due to poll on /dev/snd/timer never returning.
> >
> > The details: under some mysterious circumstances, the PCM290x
> > family sound chips can send more data than expected during an
> > isochronous transfer, leading to a babble error. Those
> > circumstances seem to in part depend on the USB host controller
> > and/or the electrical environment, since the chips work just
> > fine for most people.
> >
> > Receiving data past the end of the isochronous transfer window
> > scheduled for a device results in the XHCI controller throwing
> > a babble error, which moves the endpoint into halted state.
> >
> > This is followed by the host controller software sending a
> > reset endpoint command, and moving the endpoint into stopped
> > state, as specified on pages 164-165 of the XHCI specification.
>
> Note that isoch endpoints should generate buffer overrun instead of
> babble detect error on TD babble conditions.
> See xHCI spec 6.4.5 additional note 115.
Maybe it should, but I'm hitting this printk in handle_tx_event:
case COMP_BABBLE_DETECTED_ERROR:
xhci_dbg(xhci, "Babble error for slot %u ep %u on
endpoint\n",
slot_id, ep_index);
status = -EOVERFLOW;
break;
Tracing in the sound driver suggests that packet belongs to
the sound data device (not the sync device), because the
URB with -EOVERFLOW status for the isochronous packet in that
URB is the last one seen there.
> Or maybe a frame babble could halt an isoc endpoint, see xhci
> 4.10.2.4.1
> but then you should see a port error and port going to disabled
> state.
>
> Any logs of this?
>
> mount -t debugfs none /sys/kernel/debug
> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
> echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
> < trigger the issue >
> Send output of dmesg
> Send content of /sys/kernel/debug/tracing/trace
dmesg:
https://drive.google.com/open?id=1S2Qc8lroqA5-RMukuLBLWFGx10vEjG-i
usbtrace:
https://drive.google.com/open?id=1cbLcOnAtQRW0Chgak6PNC0l4yJv__4uO
> > However, the USB sound driver seems to have no idea that this
> > error happened. The function retire_capture_urb looks at the
> > status of each isochronous frame, but seems to be under the
> > assumption that the sound device just keeps on running.
> >
> > The function snd_complete_urb seems to only detect that the
> > device is not running if usb_submit_urb returns a failure.
> >
> > err = usb_submit_urb(urb, GFP_ATOMIC);
> > if (err == 0)
> > return;
> >
> > usb_audio_err(ep->chip, "cannot submit urb (err = %d)\n",
> > err);
> >
> > if (ep->data_subs && ep->data_subs->pcm_substream) {
> > substream = ep->data_subs->pcm_substream;
> > snd_pcm_stop_xrun(substream);
> > }
> >
> > However, the XHCI driver will happily submit an URB to a
> > stopped device. Looking at the call trace usb_submit_urb ->
> > xhci_urb_enqueue -> xhci_queue_isoc_tx_prepare -> prepare_ring,
> > you can see this code:
> >
> > /* Make sure the endpoint has been added to xHC schedule */
> > switch (ep_state) {
> > ...
> > case EP_STATE_HALTED:
> > xhci_dbg(xhci, "WARN halted endpoint, queueing URB
> > anyway.\n");
> > case EP_STATE_STOPPED:
> > case EP_STATE_RUNNING:
> > break;
> >
> > This leads me to a few questions:
> > - should retire_capture_urb call snd_pcm_stop_xrun,
> > or another function like it, if it sees certain
> > errors in the iso frame in the URB?
> > - should snd_complete_urb do something with these
> > errors, too, in case they happen on the sync frames
> > and not the data frames?
> > - does the XHCI code need to ring the doorbell when
> > submitting an URB to a stopped device, or is it
> > always up to the higher-level driver to fully reset
> > the device before it can do anything useful?
>
> xhci driver should ring the doorbell.
>
> xhci_queue_isoc_tx()
> giveback_first_trb()
> xhci_ring_ep_doorbell()
>
> when we are talking about babble or transaction errors the device
> might be
> completely unaware of the situation. Device side of the endpoint is
> not necessarily halted.
> xHC host will only halt its internal endpoint state, and it needs a
> reset
> endopoint command from the xhci driver to clear the internal halt
> state.
Interesting. I had no idea the endpoint on each side
was controlled separately.
--
All Rights Reversed.
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists