linux-kernel - Re: [syzbot] INFO: rcu detected stall in tx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <37c41d87-6e30-1557-7991-0b7bca615be1@linux.intel.com>
Date:   Mon, 24 May 2021 18:18:59 +0300
From:   Mathias Nyman <mathias.nyman@...ux.intel.com>
To:     Thinh Nguyen <Thinh.Nguyen@...opsys.com>,
        Alan Stern <stern@...land.harvard.edu>,
        Mathias Nyman <mathias.nyman@...el.com>
Cc:     Guido Kiener <Guido.Kiener@...de-schwarz.com>,
        dave penkler <dpenkler@...il.com>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        syzbot <syzbot+e2eae5639e7203360018@...kaller.appspotmail.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "lee.jones@...aro.org" <lee.jones@...aro.org>,
        USB list <linux-usb@...r.kernel.org>,
        "bp@...en8.de" <bp@...en8.de>,
        "dwmw@...zon.co.uk" <dwmw@...zon.co.uk>,
        "hpa@...or.com" <hpa@...or.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "luto@...nel.org" <luto@...nel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: [syzbot] INFO: rcu detected stall in tx

On 20.5.2021 23.30, Thinh Nguyen wrote:
> +Mathias
> 
...

> Hm... looks like we have a couple of issues in the uas storage class
> driver and the xhci driver.
> 
> We may need to fix that in the uas storage driver because it doesn't
> seem to handle it. (check uas_data_cmplt() in uas.c).
> 
> As for the xhci driver, there maybe a case where the stream URB never
> gets to complete because the transaction err_count is not properly
> updated. The err_count for transaction error is stored in ep_ring, but
> the xhci driver may not be able to lookup the correct ep_ring based on
> TRB address for streams. There are cases for streams where the event
> TRBs have their TRB pointer field cleared to '0' (xhci spec section
> 4.12.2). If the xhci driver doesn't see ep_ring for transaction error,
> it automatically does a soft-retry. This is seen from one of our
> testings that the driver was repeatedly doing soft-retry until the class
> driver timed out.
> 
> Hi Mathias, maybe you have some comment on this? Thanks.

This is true, if TRB pointer is 0 then there is no retry limit for soft retry.
We should add one and prevent a loop. after e few soft resets we can end with a
hard reset to clear the host side endpoint halt.

We don't know the URB that was being tansferred during the error, and can't 
give it back with a proper error code.
In that sense we still end up waiting for a timeout and someone to cancel
the urb.

-Mathias