[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <354a16cb-ba96-aa6f-7f10-388e6201e56d@synopsys.com>
Date: Mon, 24 May 2021 19:23:41 +0000
From: Thinh Nguyen <Thinh.Nguyen@...opsys.com>
To: Alan Stern <stern@...land.harvard.edu>,
Mathias Nyman <mathias.nyman@...ux.intel.com>
CC: Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
Mathias Nyman <mathias.nyman@...el.com>,
Guido Kiener <Guido.Kiener@...de-schwarz.com>,
dave penkler <dpenkler@...il.com>,
Dmitry Vyukov <dvyukov@...gle.com>,
syzbot <syzbot+e2eae5639e7203360018@...kaller.appspotmail.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"lee.jones@...aro.org" <lee.jones@...aro.org>,
USB list <linux-usb@...r.kernel.org>,
"bp@...en8.de" <bp@...en8.de>,
"dwmw@...zon.co.uk" <dwmw@...zon.co.uk>,
"hpa@...or.com" <hpa@...or.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"luto@...nel.org" <luto@...nel.org>,
"mingo@...hat.com" <mingo@...hat.com>,
"syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"x86@...nel.org" <x86@...nel.org>
Subject: Re: [syzbot] INFO: rcu detected stall in tx
Alan Stern wrote:
> On Mon, May 24, 2021 at 06:18:59PM +0300, Mathias Nyman wrote:
>> On 20.5.2021 23.30, Thinh Nguyen wrote:
>>> As for the xhci driver, there maybe a case where the stream URB never
>>> gets to complete because the transaction err_count is not properly
>>> updated. The err_count for transaction error is stored in ep_ring, but
>>> the xhci driver may not be able to lookup the correct ep_ring based on
>>> TRB address for streams. There are cases for streams where the event
>>> TRBs have their TRB pointer field cleared to '0' (xhci spec section
>>> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
>>> it automatically does a soft-retry. This is seen from one of our
>>> testings that the driver was repeatedly doing soft-retry until the class
>>> driver timed out.
>>>
>>> Hi Mathias, maybe you have some comment on this? Thanks.
>>
>> This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
>> We should add one and prevent a loop. after e few soft resets we can end with a
>> hard reset to clear the host side endpoint halt.
>>
>> We don't know the URB that was being tansferred during the error, and can't
>> give it back with a proper error code.
>> In that sense we still end up waiting for a timeout and someone to cancel
>> the urb.
>
> That's not good. There may not be a timeout; drivers expect transfers
> to complete with a failure, not to be retried indefinitely.
>
> However, if you do know which endpoint/stream the error is connected to,
> you should be able to get the URB. It will be the first one queued for
> that endpoint/stream.
>
When the xhci can't recover a transfer with soft-retry, no outstanding
transfer can proceed/complete for the endpoint. If the TRB pointer is 0,
we just don't know which stream or endpoint ring it's for, but we know
all the outstanding URBs of an endpoint. Let's may as well return an
error status for all of them after a limited number of soft-retries.
BR,
Thinh
Powered by blists - more mailing lists