lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <15924F7C-3C9D-4EBA-A569-FA2D77DE9FCE@canonical.com>
Date:   Wed, 20 Dec 2017 16:42:46 +0800
From:   Kai Heng Feng <kai.heng.feng@...onical.com>
To:     Brian Norris <briannorris@...omium.org>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Matthias Kaehlcke <mka@...omium.org>,
        LKML <linux-kernel@...r.kernel.org>, stable@...r.kernel.org,
        Leif Liddy <leif.linux@...il.com>,
        Marcel Holtmann <marcel@...tmann.org>,
        Guenter Roeck <groeck@...omium.org>
Subject: Re: [PATCH 4.13 03/28] Bluetooth: btusb: fix QCA Rome suspend/resume


> On 20 Dec 2017, at 7:11 AM, Brian Norris <briannorris@...omium.org> wrote:
> 
> Hi Kai,
> 
> On Tue, Dec 19, 2017 at 12:28:17PM +0800, Kai Heng Feng wrote:
>>> On 19 Dec 2017, at 2:13 AM, Brian Norris <briannorris@...omium.org> wrote:
>>> On Mon, Dec 18, 2017 at 12:43:48PM +0100, Greg Kroah-Hartman wrote:
>>>> On Fri, Dec 15, 2017 at 07:05:39PM -0800, Matthias Kaehlcke wrote:
>>>>> We identified the above patch as the culprit, in combination with USB
>>>>> autosuspend being enabled for the Bluetooth controller.
>>>>> 
>>>>> We found that the following (recent) upstream patch fixes the issue on
>>>>> most devices (we are still investigating one case on a specific device):
>>> 
>>> A key word to highlight here is "most". I have at least one device that
>>> is consistently having problems even with both patches included; the
>>> only way things work reliably so far is to simply revert the $subject
>>> patch in 4.4.x.
>> 
>> The problem we have is that the QCA Rome Bluetooth stops working
>> after system S3. The reset resume quirk workaround the issue on both
>> mainline and 4.4.x (at least for me).
> 
> Understood. But that's not the case for all systems, and on some, you're
> regressing functionality.

Yes. So reverting the commit is a reasonable path we should take.

> 
> Many systems keep power enabled for system suspend, and so that "fix" is
> not necessary for them. It's also not very precise, because it seems to
> act in many cases where it need not -- for one, it takes effect on *all*
> resume attempts, even resuming from autosuspend. I really doubt your
> system is losing USB power in S0 + USB autosuspend?

No, USB power is on. Not seeing my XHCI gets put to D3cold.

> 
> So, if you really need this patch for some systems, it seems like it
> should be much better targeted. You're currently adding a lot of
> unnecessary overhead and fragility.

You are right. Applying this unconditionally to all QCA Rome may be
overkill.

> 
>>> I'll try to investigate further, since this result is a bit confusing
>>> still. If there's a real problem with the patch (and I suspect there
>>> might be), it probably shouldn't be in mainline either…
>> 
>> Do you have the same issue on mainline?
> 
> Refer to my note [1] :)
> 
> But because you asked, I did the work necessary to boot mainline on this
> system, and in fact, I see the same problem. I think that's enough
> reason to revert your patch in all branches (upstream, and any -stable
> branch that already took it).
> 
> To be clear, here's the kernel logs on 4.15-rc4+, where the 55-second
> mark is around where I try to power on and scan for BT devices (power-on
> and scan both fail):
> 
> [    2.323475] usb 3-1: new full-speed USB device number 2 using ohci-platform
> ...
> [    2.731719] usb 3-1: New USB device found, idVendor=0cf3, idProduct=e300
> ...
> [    2.867870] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
> ...(udev doesn't run until here)...
> [   35.733139] usbcore: registered new interface driver btusb
> [   35.740912] Bluetooth: hci0: using rampatch file: qca/rampatch_usb_00000302.bin
> [   35.749216] Bluetooth: hci0: QCA: patch rome 0x302 build 0x3e8, firmware rome 0x302 build 0x111
> ...
> [   35.816104] Bluetooth: hci0: using NVM file: qca/nvm_usb_00000302.bin
> ...
> [   55.073503] usb 3-1: reset full-speed USB device number 2 using ohci-platform
> ...
> [   57.772513] Bluetooth: hci0: urb 00000000ec39086b failed to resubmit (113)
> [   57.780217] Bluetooth: hci0: urb 0000000066228fda failed to resubmit (113)
> 
> Whereas reverting the $subject patch yields no reset and URB failure;
> Bluetooth seems to work fine.

Thanks for your testing on mainline!

> 
>>>>> commit a0085f2510e8976614ad8f766b209448b385492f
>>>>> Author: Sukumar Ghorai <sukumar.ghorai@...el.com>
>>>>> Date:   Wed Aug 16 14:46:55 2017 -0700
>>>>> 
>>>>>   Bluetooth: btusb: driver to enable the usb-wakeup feature
>>> [...]
>>>>> stable branches are currently broken for BTUSB_QCA_ROME with USB
>>>>> autosuspend enabled, since the above patch is not included (I only
>>>>> checked v4.4 and v4.9), so we probably want to integrate it.
>>>> 
>>>> Now queued up, thanks for letting me know.
>>> 
>>> I think you have almost as much information as I do at this point, and
>>> I'll try to reply back here if I figure out anything more, so I'll leave
>>> you to your decisions. But I would personally revert, not backport more
>>> patches.
>>> 
>>> Among the reasons:
>>> (a) I have at least one device that does not work better with both
>>>   patches [1]
>>> (b) So far, I haven't seen any explanation why commit a0085f2510e8
>>>   should be a dependency for $subject patch; the closest I can imagine
>>>   would be that commit a0085f2510e8 could effectively neutralize
>>>   $subject patch -- if the device is marked as a wakeup source, then
>>>   it will never try to RESET on resume -- but that's still a fuzzy
>>>   signal; just because it's marked a wakeup source sometimes doesn't
>>>   mean it always will be. We can disable it in user space too.
>> 
>> Hi Leif,
>> 
>> Can you try if a0085f2510e8 breaks your $subject commit?
>> If it’s a yes, then what Brian suggests is correct.
>> 
>> Also, can you share the output of "usb-device” for your btusb device?
> 
> FWIW, here's mine:
> 
> T:  Bus=03 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#=  2 Spd=12  MxCh= 0
> D:  Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
> P:  Vendor=0cf3 ProdID=e300 Rev=00.01
> C:  #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
> I:  If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
> I:  If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
> 
> This is an integrated Wifi+BT QCA6174A-5 M.2 module.

Thanks for the info. I found the same BT on a Killer Wireless, I’ll do
some testing on it.

> 
>>> (c) Isn't it safer to revert than to backport more? You're delving into
>>>   feature-land, not bugfix-land…
>> 
>> Sounds reasonable.
> 
> Shall I post a proper revert for mainline, with Fixes tags and CC
> stable? Or did one of you want to do that?

Let me clean my own mess ;)

> 
>> We’ll need more information from Leif if we need to do ID-specific quirks.
> 
> As I see it, this probably isn't just an ID-specific thing; this patch
> looks bad for its intended purpose, per my comments above (it's
> overkill, for one). I think there are many factors involved here, and it
> needs more study than "I just booted, suspended/resumed once, and it
> worked better."
> 
> Among the things to consider:
> * How do we determine the likelihood that the device gets powered off in
>  S3? The device_may_wakeup() check can be influenced (among other
>  things) by user space policy, so that doesn't really look right to me.
>  (ID-based checks might help a little, since you can probably
>  differentiate chips that are likely used on discrete/removable USB
>  parts, rather than internal chips?)

Actually, I think we can’t know for sure. From my own experience, it’s
pretty ODM specific. Some external chips keep its power under S3.

> * The ordering: right now, you're enabling "reset on resume" at probe
>  time, but QCA ROME devices only load their firmware in btusb_open().
>  This gives a window of time in which you're adding weird reset
>  behavior while the device may be running its ROM firmware, not the
>  patched firmware loaded from /lib/firmware/. That might be one reason
>  things don't work out so well?

You are right, the reset on resume should be used when USB probing,
instead of in the middle of btusb_open().

>  I also suspect that might be one reason that Matthias sometimes saw
>  the problem disappear on his devices, where I didn't. I have my device
>  configured to cold reset the BT device on every reboot, whereas his
>  isn't; it might retain the FW in the BT device's RAM across reboot.
>  So my behavior was more deterministic :)
> * The behavior here is affected by USB autosuspend, not just S3. I
>  mentioned this above, but more specifically, ChromeOS has udev rules
>  like this, to enable autosuspend on various sorts of devices
>  (including bluetooth controllers + BT HID devices):
> 
> <udev rule snippet>
> ATTR{idVendor}=="0cf3", ATTR{idProduct}=="e300", GOTO="autosuspend_enable"
> ...
> # Enable autosuspend
> LABEL="autosuspend_enable"
> TEST=="power/control", ATTR{power/control}="auto", GOTO="autosuspend_end"
> 
> LABEL="autosuspend_end"
> </udev rule snippet>
> 
>  I suspect some desktop systems might not have the same policies, so
>  you're not testing autosuspend in the same way I am?

Actually it’s same from kernel’s perspective. I use TLP, it does the same
thing to the power/control knob.

Kai-Heng

> 
> Brian
> 
>> Kai-Heng
>> 
>>> Brian
>>> 
>>> [1] Before you ask: this particular device is not quite fully supported
>>> on upstream yet, though its sibling devices are. With a bit of effort, I
>>> might be able to test a clean upstream. At the moment, I'm using our
>>> Chrom{e,ium}OS 4.4 kernel, where we regularly merge in -stable updates.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ