lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 24 May 2021 22:48:51 +0000
From:   Thinh Nguyen <Thinh.Nguyen@...opsys.com>
To:     Mathias Nyman <mathias.nyman@...ux.intel.com>,
        Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
        Alan Stern <stern@...land.harvard.edu>
CC:     Mathias Nyman <mathias.nyman@...el.com>,
        Guido Kiener <Guido.Kiener@...de-schwarz.com>,
        dave penkler <dpenkler@...il.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        syzbot <syzbot+e2eae5639e7203360018@...kaller.appspotmail.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "lee.jones@...aro.org" <lee.jones@...aro.org>,
        USB list <linux-usb@...r.kernel.org>,
        "bp@...en8.de" <bp@...en8.de>,
        "dwmw@...zon.co.uk" <dwmw@...zon.co.uk>,
        "hpa@...or.com" <hpa@...or.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "luto@...nel.org" <luto@...nel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: [syzbot] INFO: rcu detected stall in tx

Mathias Nyman wrote:
> On 24.5.2021 22.23, Thinh Nguyen wrote:
>> Alan Stern wrote:
>>> On Mon, May 24, 2021 at 06:18:59PM +0300, Mathias Nyman wrote:
>>>> On 20.5.2021 23.30, Thinh Nguyen wrote:
>>>>> As for the xhci driver, there maybe a case where the stream URB never
>>>>> gets to complete because the transaction err_count is not properly
>>>>> updated. The err_count for transaction error is stored in ep_ring, but
>>>>> the xhci driver may not be able to lookup the correct ep_ring based on
>>>>> TRB address for streams. There are cases for streams where the event
>>>>> TRBs have their TRB pointer field cleared to '0' (xhci spec section
>>>>> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
>>>>> it automatically does a soft-retry. This is seen from one of our
>>>>> testings that the driver was repeatedly doing soft-retry until the class
>>>>> driver timed out.
>>>>>
>>>>> Hi Mathias, maybe you have some comment on this? Thanks.
>>>>
>>>> This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
>>>> We should add one and prevent a loop. after e few soft resets we can end with a
>>>> hard reset to clear the host side endpoint halt.
>>>>
>>>> We don't know the URB that was being tansferred during the error, and can't 
>>>> give it back with a proper error code.
>>>> In that sense we still end up waiting for a timeout and someone to cancel
>>>> the urb.
>>>
>>> That's not good.  There may not be a timeout; drivers expect transfers 
>>> to complete with a failure, not to be retried indefinitely.
>>>
>>> However, if you do know which endpoint/stream the error is connected to, 
>>> you should be able to get the URB.  It will be the first one queued for 
>>> that endpoint/stream.
>>>
>>
>> When the xhci can't recover a transfer with soft-retry, no outstanding
>> transfer can proceed/complete for the endpoint. If the TRB pointer is 0,
>> we just don't know which stream or endpoint ring it's for, but we know
>> all the outstanding URBs of an endpoint. Let's may as well return an
>> error status for all of them after a limited number of soft-retries.
> 
> We get the endpoint, but not the stream.

Right.

> 
> I guess we could walk through each stream of this endpoint, and return the 
> first URB of every stream that has a pending URB.
> xHCI spec claims to supports 65533 streams per endpoint, but in real life 
> UAS probably only uses a few per endpoint?
> 
> -Mathias 
> 

Typically UASP devices advertise to support up to 32 streams. We notice
that some newer builds of Windows OS has a bug (or intentional?) that it
rejects any device that uses more or less than 32 streams (probably a
bug) in the descriptor.

I think we only need to do this if we don't know which stream the event
belongs to. Otherwise, we can keep the old logic.

BR,
Thinh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ