lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210506134904.GA734112@rowland.harvard.edu>
Date:   Thu, 6 May 2021 09:49:04 -0400
From:   Alan Stern <stern@...land.harvard.edu>
To:     Guido Kiener <Guido.Kiener@...de-schwarz.com>
Cc:     Dmitry Vyukov <dvyukov@...gle.com>,
        syzbot <syzbot+e2eae5639e7203360018@...kaller.appspotmail.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "dpenkler@...il.com" <dpenkler@...il.com>,
        "lee.jones@...aro.org" <lee.jones@...aro.org>,
        USB list <linux-usb@...r.kernel.org>,
        "bp@...en8.de" <bp@...en8.de>,
        "dwmw@...zon.co.uk" <dwmw@...zon.co.uk>,
        "hpa@...or.com" <hpa@...or.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "luto@...nel.org" <luto@...nel.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "x86@...nel.org" <x86@...nel.org>
Subject: Re: Re: Re: [syzbot] INFO: rcu detected stall in tx

On Wed, May 05, 2021 at 10:22:24PM +0000, Guido Kiener wrote:
> > -----Original Message-----
> > From: Alan Stern <stern@...land.harvard.edu>
> > Sent: Tuesday, May 4, 2021 5:14 PM
> > To: Kiener Guido 14DS1 
> > Subject: Re: Re: [syzbot] INFO: rcu detected stall in tx
> > 
> > On Mon, May 03, 2021 at 09:56:05PM +0000, Guido Kiener wrote:
> > > Hi all,
> > >
> > > Dave and I discussed the "self-detected stall on CPU" caused by the usbtmc
> > driver.
> > >
> > > What happened?
> > > The callback handler usbtmc_interrupt(struct urb *urb) for the INT pipe receives
> > an erroneous urb with status -EPROTO (-71).
> > > See
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > > e/drivers/usb/class/usbtmc.c?h=v5.12#n2340
> > > -EPROTO does not abort/shutdown the pipe and the urb is resubmitted to receive
> > the next packet. However the callback handler usbtmc_interrupt is called again with
> > the same erroneous status -EPROTO and this seems to result in an endless loop.
> > > According to
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > > e/Documentation/driver-api/usb/error-codes.rst?h=v5.12#n177
> > > the error -EPROTO indicates a hardware problem or a bad cable.
> > >
> > > Most usb drivers do not react in a specific way on this hardware problems and
> > resubmit the urb. We assume these drivers will run into the same endless loop.
> > Some other driver samples are:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > > e/drivers/usb/class/cdc-acm.c?h=v5.12#n379
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> > > e/drivers/hid/usbhid/usbmouse.c?h=v5.12#n65
> > >
> > > Possible solutions:
> > > Hardware defects or bad cables seems to be a common problem for most usb
> > drivers and I assume we do not want to fix this problem in all class specific drivers,
> > but in lower level host drivers, e.g:
> > > 1. Using a counter and close the pipe after some detected errors 2.
> > > Delay the resubmission of the urb to avoid high cpu usage 3. Do
> > > nothing, since it is just a rare problem.
> > >
> > > We've never seen this problem in our products and we do not dare to change
> > anything.
> > 
> > Drivers are not consistent in the way they handle these errors, as you have seen.  A
> > few try to take active measures, such as retrys with increasing timeouts.  Many
> > drivers just ignore them, which is not a very good idea.
> > 
> > The general feeling among kernel USB developers is that a -EPROTO, -EILSEQ, or
> > -ETIME error should be regarded as fatal, much the same as an unplug event.  The
> > driver should avoid resubmitting URBs and just wait to be unbound from the device.
> 
> Thanks for your assessment. I agree with the general feeling. I counted about hundred
> specific usb drivers, so wouldn't it be better to fix the problem in some of the host drivers (e.g. urb.c)?
> We could return an error when calling usb_submit_urb() on an erroneous pipe.
> I cannot estimate the side effects and we need to check all drivers again how they deal with the
> error situation. Maybe there are some special driver that need a specialized error handling.
> In this case these drivers could reset the (new?) error flag to allow calling usb_submit_urb()
> again without error. This could work, isn't it?

That is feasible, although it would be an awkward approach.  As you 
said, the side effects aren't clear.  But it might work.

> > If you would like to audit drivers and fix them up to behave this way, that would be
> > great.
> 
> Currently not. I cannot pull the USB cable in home office :-), but I will keep an eye on it.
> When I'm more involved in the next USB driver issue than I will test bad cables and 
> maybe get more ideas how we could test and fix this rare error.

Will you be able to test patches?

Alan Stern

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ