[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9b96ec8-cf5c-6d62-2ec2-390dd72ea4d4@linux.intel.com>
Date: Tue, 25 May 2021 01:16:54 +0300
From: Mathias Nyman <mathias.nyman@...ux.intel.com>
To: Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
Alan Stern <stern@...land.harvard.edu>
Cc: Mathias Nyman <mathias.nyman@...el.com>,
Guido Kiener <Guido.Kiener@...de-schwarz.com>,
dave penkler <dpenkler@...il.com>,
Dmitry Vyukov <dvyukov@...gle.com>,
syzbot <syzbot+e2eae5639e7203360018@...kaller.appspotmail.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"lee.jones@...aro.org" <lee.jones@...aro.org>,
USB list <linux-usb@...r.kernel.org>,
"bp@...en8.de" <bp@...en8.de>,
"dwmw@...zon.co.uk" <dwmw@...zon.co.uk>,
"hpa@...or.com" <hpa@...or.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"luto@...nel.org" <luto@...nel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [syzbot] INFO: rcu detected stall in tx
On 24.5.2021 22.23, Thinh Nguyen wrote:
> Alan Stern wrote:
>> On Mon, May 24, 2021 at 06:18:59PM +0300, Mathias Nyman wrote:
>>> On 20.5.2021 23.30, Thinh Nguyen wrote:
>>>> As for the xhci driver, there maybe a case where the stream URB never
>>>> gets to complete because the transaction err_count is not properly
>>>> updated. The err_count for transaction error is stored in ep_ring, but
>>>> the xhci driver may not be able to lookup the correct ep_ring based on
>>>> TRB address for streams. There are cases for streams where the event
>>>> TRBs have their TRB pointer field cleared to '0' (xhci spec section
>>>> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
>>>> it automatically does a soft-retry. This is seen from one of our
>>>> testings that the driver was repeatedly doing soft-retry until the class
>>>> driver timed out.
>>>>
>>>> Hi Mathias, maybe you have some comment on this? Thanks.
>>>
>>> This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
>>> We should add one and prevent a loop. after e few soft resets we can end with a
>>> hard reset to clear the host side endpoint halt.
>>>
>>> We don't know the URB that was being tansferred during the error, and can't
>>> give it back with a proper error code.
>>> In that sense we still end up waiting for a timeout and someone to cancel
>>> the urb.
>>
>> That's not good. There may not be a timeout; drivers expect transfers
>> to complete with a failure, not to be retried indefinitely.
>>
>> However, if you do know which endpoint/stream the error is connected to,
>> you should be able to get the URB. It will be the first one queued for
>> that endpoint/stream.
>>
>
> When the xhci can't recover a transfer with soft-retry, no outstanding
> transfer can proceed/complete for the endpoint. If the TRB pointer is 0,
> we just don't know which stream or endpoint ring it's for, but we know
> all the outstanding URBs of an endpoint. Let's may as well return an
> error status for all of them after a limited number of soft-retries.
We get the endpoint, but not the stream.
I guess we could walk through each stream of this endpoint, and return the
first URB of every stream that has a pending URB.
xHCI spec claims to supports 65533 streams per endpoint, but in real life
UAS probably only uses a few per endpoint?
-Mathias
Powered by blists - more mailing lists