lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <3A938253-4CB7-4D24-82B1-6A2E3B794077@holtmann.org>
Date:   Wed, 13 Sep 2017 08:52:15 +0200
From:   Marcel Holtmann <marcel@...tmann.org>
To:     "Luis R. Rodriguez" <mcgrof@...nel.org>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Gabriel C <nix.or.die@...il.com>, Tejun Heo <tj@...nel.org>,
        "Gustavo F. Padovan" <gustavo@...ovan.org>,
        Sukumar Ghorai <sukumar.ghorai@...el.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "bluez mailin list (linux-bluetooth@...r.kernel.org)" 
        <linux-bluetooth@...r.kernel.org>
Subject: Re: btusb "firmware request while host is not available" at resume

Hi Luis,

>>>> If something needs to be fixed, can you make a patch showing that?  Or
>>>> do we also need to revert anything else as well to get back to a "better
>>>> working" state?
>>> 
>>> I took a look at the driver and it seems that btusb_setup_intel_new() is
>>> not called after the driver is initialized, rather its called only when
>>> hci_dev_do_open() is called. Its not clear to me how you can end up calling
>>> this on resume but not prior to this on a running system. Feedback from
>>> someone more familiar with bt would be useful.
>>> 
>>> I'd have the call for firmware on probe, no processing would be needed, just
>>> a load to kick the cache into effect would suffice. This may require a bit
>>> of code shift so its best someone more familiar do this.
>>> 
>>> If it confirmed this information is helping avoid these races we can reconsider
>>> re-instating the warn as a firmware dev debugging aid for developers.
>>> 
>>> If the race this warning complained about is indeed possible the same race is
>>> also possible for other usermode helpers. Its *why* the UMH lock was
>>> implemented, it however was never generalized.
>> 
>> we can not load firmware on probe() in most cases. The btusb.ko driver
>> establishes the HCI transport. It is setup in probe() and then started in
>> hci_dev_do_open() and if there is a vendor setup routine like
>> btsub_setup_intel_new(), then it is executed. Most Bluetooth controllers (not
>> all, but most) are doing firmware loading over the HCI transport with vendor
>> specific commands.
>> 
>> And yes, if the firmware was already loaded, we would skip requesting it at
>> all. Which means after suspend/resume cycle where the power to the controller
>> is cut, then we are missing the firmware from the cache. Since we never
>> loaded it in the first place.
> 
> I checked and prior to commit 81f95076281f ("firmware: add sanity check on
> shutdown/suspend") and commit 06a45a93e7d34aa ("firmware: move umh try locks
> into the umh code") I believe we could end up also failing at a firmware
> request as well. Can you confirm? I see one bug report which confirms this
> since v3.13:
> 
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1356076
> 
> For the sync case we had:
> 
> -               ret = usermodehelper_read_trylock();
> -               if (WARN_ON(ret)) {
> -                       dev_err(device, "firmware: %s will not be loaded\n",
> -                               name);
> -                       goto out;
> -               }
> 
> The warning splat in launchpad bug 1356076 is the above case.
> 
> For the async case we had:
> 
> -               timeout = usermodehelper_read_lock_wait(timeout);
> -               if (!timeout) {
> -                       dev_dbg(device, "firmware: %s loading timed out\n",
> -                               name);
> -                       ret = -EBUSY;
> -                       goto out;
> 
> This would be the more silent case.
> 
> If this is accurate than the error case has just been modified to be
> a bit clearer, and without being implicated by the UMH lock. This was
> the design change after all. Nothing more *broken* should have happened.
> 
> In other words if fixing your issues goes away by reverting commit
> 81f95076281f, to be fair you should also try reverting commit 06a45a93e7d34aa
> which did the actual removal of the UMH lock when we don't use the UMH. *That*
> was previously the failure trigger point.
> 
>> So yes, we would have to redo parts of the vendor specific handling to always
>> request the firmware, even if we do not need it right now. And frankly that
>> is not obvious to anybody.
> 
> I agree. I thought a bit about the above and tried co come to a clever resolution,
> for instance one resolution could be to use the MODULE_FIRMWARE() declarations
> of sorts to then let the firmware API do the pre-fetching for you at init. This
> would just require a 1 line change to drivers and some already have this.
> There are two issues with this though, one is that the firmware cache works with
> devres and devres requires a device already present. Second is firmware names
> can be very dynamic, so one can only know the firmware name later at boot.

I don’t like that idea since we have drivers like btusb.ko that support devices from many manufactures. So the only difference is the firmware loading and setup. After that they behave all the same. So pre-loading all Bluetooth firmwares is pretty crazy.

> Another issues is that most driver developers use the firmware API and without
> knowing *are* creating a firmware cache entry, whereas a few times they don't
> need a firmware cache entry. This is more of performance / optimization thing,
> but something to consider long term as well.
> 
>> We seem to have some patches for doing exactly
>> that, but I have not gotten to review them in detail since they deal with
>> vendor specific complex setup handling.
> 
> Correct me if I'm wrong but some of this work very likely came from the above
> old errors, not the new ones?
> 
>> Also this affects more than just Intel since all hardware where firmware
>> loading is skipped if there is already firmware loaded are affected.
> 
> Right, its a core firmware API issue, but this issue has been present for a
> while, it was however in different form. I'm glad we're able to think ahead
> and address this now though, it would seem there are quite a bit of intrusive
> changes required to use the API right, I'd hope we could help on the firmware
> API front to make these changes smaller.
> 
> First, I'd like to understand a bit more how a device ends up with firmware
> loaded, without having gone through the firmware API.
> 
> Second, while requesting firmware at probe seems not necessary, would it
> at least be reasonable to require a struct device and a list of possible
> firmware that could be used be made available? If so a separate hook could
> be provided to help pre-load these only onto the firmware cache. Ie, we would
> not actually look for the firmware but just create a firmware cache instance
> so that when we suspend we *do* go fetch for these so that they are ready and
> available upon resume.

The devices come with boatloader mode and operational mode and both are independent. And most of them only fallback when fully power cycling the device. So a simple boot linux and reboot linux would get you into the situation that the firmware is already loaded. There are also some UEFI bioses that can load the firmware. So there are multitudes of possibilities where we can end up with a firmware loaded.

Regards

Marcel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ