linux-kernel - Re: [RFC PATCH] xhci: do not halt the secondary HCD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACPK8XcMoMZBw39a2BSuS_oKs-Qk4TSsGXEUDtb2Qk4p+0fxMw@mail.gmail.com>
Date:   Mon, 19 Sep 2016 17:53:52 +0930
From:   Joel Stanley <joel@....id.au>
To:     Mathias Nyman <mathias.nyman@...ux.intel.com>
Cc:     linux-usb@...r.kernel.org, linux-kernel@...r.kernel.org,
        gregkh@...uxfoundation.org,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>
Subject: Re: [RFC PATCH] xhci: do not halt the secondary HCD

Hi Mathias,

On Mon, Sep 19, 2016 at 4:33 PM, Greg KH <gregkh@...uxfoundation.org> wrote:
> On Mon, Sep 19, 2016 at 04:05:45PM +0930, Joel Stanley wrote:
>> We can't halt the secondary HCD, because it's also the primary HCD,
>> which will cause problems if we have devices attached to the primary
>> HCD, like a keyboard.
>>
>> We've been carrying this in our Linux-as-a-bootloader environment for a little
>> while now. The machines all have the same TI TUSB73x0 part, and when we kexec
>> the devices don't come back until a system power cycle.
>>
>> I'd like some advice on an acceptable way to upstream the fix, so that the xhci
>> device survives kexec.
>
> Any reason you didn't cc: Mathias?

Fat fingers - I missed him when grabbing names from get_maintainers.
Thanks for adding him in.

On Mon, Sep 19, 2016 at 5:11 PM, Mathias Nyman
<mathias.nyman@...ux.intel.com> wrote:
> What kernel version is this?

This patch is against 4.4.21. I've tested newer releases but haven't
seen any improvement.

> As Greg said there are fixes in this area in the 4.8 latest rc kernel.
>
> If that doesn't work then we need to figure out what the real issue is.

No dice on 4.8-rc7 (without any patches).

Here's 4.8-rc7 loading:

[    3.699524] xhci_hcd 0021:09:00.0: xHCI Host Controller
[    3.699556] xhci_hcd 0021:09:00.0: new USB bus registered, assigned
bus number 1
[    3.699640] xhci_hcd 0021:09:00.0: Using 64-bit DMA iommu bypass
[    3.699697] xhci_hcd 0021:09:00.0: hcc params 0x0270f06d hci
version 0x96 quirks 0x00000000
[    3.700286] hub 1-0:1.0: USB hub found
[    3.700299] hub 1-0:1.0: 4 ports detected
[    3.700493] xhci_hcd 0021:09:00.0: xHCI Host Controller
[    3.700522] xhci_hcd 0021:09:00.0: new USB bus registered, assigned
bus number 2
[    3.700552] usb usb2: We don't know the algorithms for LPM for this
host, disabling LPM.
[    3.700733] hub 2-0:1.0: USB hub found
[    3.700748] hub 2-0:1.0: 4 ports detected

Then we kexec into the second kernel. Here's what the second kernel
prints when trying to bring the controller up:

[    1.588272] xhci_hcd 0021:09:00.0: xHCI Host Controller
[    1.588282] xhci_hcd 0021:09:00.0: new USB bus registered, assigned
bus number 1
[    1.619279] xhci_hcd 0021:09:00.0: Host not halted after 16000 microseconds.
[    1.619281] xhci_hcd 0021:09:00.0: can't setup: -110
[    1.619447] xhci_hcd 0021:09:00.0: USB bus 1 deregistered
[    1.619457] xhci_hcd 0021:09:00.0: init 0021:09:00.0 fail, -110
[    1.619571] xhci_hcd: probe of 0021:09:00.0 failed with error -110

Note that the second kernel is a distro one (Ubuntu 4.4.0-36-generic).

> xhci hardware is really just one controller. The split into primary and
> secondary HCD
> is a software only. We always load the primary HCD first (USB2) and
> secondary second (USB3).
> We unload them in reverse order, and need to stop the xhci (halt the hcd) as
> a first step.
>
> load primary
> load secondary  (starts the xhci controller
> ...
> unload secondary (halts the controller)
> unload primary   (free memory)

Thanks for the explanation. I wasn't the author of the first hack we
put in our tree, but I have rewritten it as we rebase on the stable
tree regularly.

So the hack as I sent it doesn't do any halt the secondary, and lets
the primary unload path halt the controller. Any theory as to why this
helps?

Cheers,

Joel