linux-kernel - Re: [PATCH] xhci: use iopoll for xhci

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAGS+omCf5FAoNye59A99ry+j9j6_JsNjLxSq3pxDMbe=jnvY0w@mail.gmail.com>
Date:   Thu, 28 Feb 2019 09:49:39 -0700
From:   Daniel Kurtz <djkurtz@...omium.org>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     rrangel@...omium.org, Mathias Nyman <mathias.nyman@...el.com>,
        "open list:USB XHCI DRIVER" <linux-usb@...r.kernel.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] xhci: use iopoll for xhci_handshake

On Thu, Feb 28, 2019 at 12:09 AM Greg Kroah-Hartman
<gregkh@...uxfoundation.org> wrote:
>
> On Wed, Feb 27, 2019 at 03:19:17PM -0700, Daniel Kurtz wrote:
> > In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with
> > a spin lock held (and local interrupts disabled) with a huge 5 second
> > timeout.  This can translates to 5 million calls to udelay(1).  By its
> > very nature, udelay() is not meant to be precise, it only guarantees to
> > delay a minimum of 1 microsecond. Therefore the actual delay of
> > xhci_handshake() can be significantly longer.  If the average udelay(1)
> > is greater than 2.2 us, the total time in xhci_handshake() - with
> > interrupts disabled can be > 11 seconds triggering the kernel's soft lockup
> > detector.
> >
> > To avoid this, let's replace the open coded io polling loop with one from
> > iopoll.h that uses a loop timed with the more presumably reliable ktime
> > infrastructure.
> >
> > Signed-off-by: Daniel Kurtz <djkurtz@...omium.org>
>
> Looks sane to me, nice fixup.
>
> Reviewed-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
>
> Is this causing problems on older kernels/devices today such that we
> should backport this?

We detected that xhci_handshake timing out can lead to softlockup
while debugging a USB issue on a new product.  The xhci_handshake
timeout itself is a symptom of another underlying problem causing some
commands to be aborted.  I don't know if any such underlying problems
exist on other older devices, but the potential is there so a backport
is reasonable.  Although, it may just shift the symptom of an
underlying problem from a softlockup/oops to some other symptom, like
USB just being dead.

-Dan

>
> thanks,
>
> greg k-h