lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5543F194.2090307@redhat.com>
Date:	Fri, 01 May 2015 14:35:16 -0700
From:	Laura Abbott <labbott@...hat.com>
To:	Takashi Iwai <tiwai@...e.de>, Marcel Holtmann <marcel@...tmann.org>
CC:	Ming Lei <ming.lei@...onical.com>,
	Laura Abbott <labbott@...oraproject.org>,
	"Gustavo F. Padovan" <gustavo@...ovan.org>,
	Johan Hedberg <johan.hedberg@...il.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	linux-bluetooth@...r.kernel.org
Subject: Re: [RFC][PATCH] firmware: Drop WARN from usermodehelper_read_trylock
 error case

On 05/01/2015 01:22 PM, Laura Abbott wrote:
> On 04/30/2015 07:43 AM, Takashi Iwai wrote:
>> At Thu, 30 Apr 2015 07:19:47 -0700,
>> Marcel Holtmann wrote:
>>>
>>> Hi Ming,
>>>
>>>>>>>> We've received a number of reports of warnings when coming
>>>>>>>> out of suspend with certain bluetooth firmware configurations:
>>>>>>>>
>>>>>>>> WARNING: CPU: 3 PID: 3280 at drivers/base/firmware_class.c:1126
>>>>>>>> _request_firmware+0x558/0x810()
>>>>>>>> Modules linked in: ccm ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
>>>>>>>> xt_conntrack ebtable_nat ebtable_broute bridge stp llc
>>>>>>>> ebtable_filter
>>>>>>>> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
>>>>>>>> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
>>>>>>>> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
>>>>>>>> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw
>>>>>>>> binfmt_misc bnep intel_rapl iosf_mbi arc4 x86_pkg_temp_thermal
>>>>>>>> snd_hda_codec_hdmi coretemp kvm_intel joydev snd_hda_codec_realtek
>>>>>>>> iwldvm snd_hda_codec_generic kvm iTCO_wdt mac80211
>>>>>>>> iTCO_vendor_support
>>>>>>>> snd_hda_intel snd_hda_controller snd_hda_codec crct10dif_pclmul
>>>>>>>> snd_hwdep crc32_pclmul snd_seq crc32c_intel ghash_clmulni_intel
>>>>>>>> uvcvideo
>>>>>>>> snd_seq_device iwlwifi btusb videobuf2_vmalloc snd_pcm
>>>>>>>> videobuf2_core
>>>>>>>> serio_raw bluetooth cfg80211 videobuf2_memops sdhci_pci v4l2_common
>>>>>>>> videodev thinkpad_acpi sdhci i2c_i801 lpc_ich mfd_core wacom
>>>>>>>> mmc_core
>>>>>>>> media snd_timer tpm_tis hid_logitech_hidpp wmi tpm rfkill snd
>>>>>>>> mei_me mei
>>>>>>>> shpchp soundcore nfsd auth_rpcgss nfs_acl lockd grace sunrpc i915
>>>>>>>> i2c_algo_bit drm_kms_helper e1000e drm hid_logitech_dj ptp pps_core
>>>>>>>> video
>>>>>>>> CPU: 3 PID: 3280 Comm: kworker/u17:0 Not tainted
>>>>>>>> 3.19.3-200.fc21.x86_64
>>>>>>>> Hardware name: LENOVO 343522U/343522U, BIOS GCET96WW (2.56 )
>>>>>>>> 10/22/2013
>>>>>>>> Workqueue: hci0 hci_power_on [bluetooth]
>>>>>>>> 0000000000000000 0000000089944328 ffff88040acffb78 ffffffff8176e215
>>>>>>>> 0000000000000000 0000000000000000 ffff88040acffbb8 ffffffff8109bc1a
>>>>>>>> 0000000000000000 ffff88040acffcd0 00000000fffffff5 ffff8804076bac40
>>>>>>>> Call Trace:
>>>>>>>> [<ffffffff8176e215>] dump_stack+0x45/0x57
>>>>>>>> [<ffffffff8109bc1a>] warn_slowpath_common+0x8a/0xc0
>>>>>>>> [<ffffffff8109bd4a>] warn_slowpath_null+0x1a/0x20
>>>>>>>> [<ffffffff814dbe78>] _request_firmware+0x558/0x810
>>>>>>>> [<ffffffff814dc165>] request_firmware+0x35/0x50
>>>>>>>> [<ffffffffa03a7886>] btusb_setup_bcm_patchram+0x86/0x590 [btusb]
>>>>>>>> [<ffffffff814d40e6>] ? rpm_idle+0xd6/0x230
>>>>>>>> [<ffffffffa04d4801>] hci_dev_do_open+0xe1/0xa90 [bluetooth]
>>>>>>>> [<ffffffff810c51dd>] ? ttwu_do_activate.constprop.90+0x5d/0x70
>>>>>>>> [<ffffffffa04d5980>] hci_power_on+0x40/0x200 [bluetooth]
>>>>>>>> [<ffffffff810b487c>] process_one_work+0x14c/0x3f0
>>>>>>>> [<ffffffff810b52f3>] worker_thread+0x53/0x470
>>>>>>>> [<ffffffff810b52a0>] ? rescuer_thread+0x300/0x300
>>>>>>>> [<ffffffff810ba548>] kthread+0xd8/0xf0
>>>>>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>>>>>>> [<ffffffff81774958>] ret_from_fork+0x58/0x90
>>>>>>>> [<ffffffff810ba470>] ? kthread_create_on_node+0x1b0/0x1b0
>>>>>>>>
>>>>>>>> This occurs after every resume.
>>>>>>>>
>>>>>>>> When resuming, the bluetooth driver needs to re-request the
>>>>>>>> firmware. This re-request is happening before usermodehelper
>>>>>>>> is fully enabled. If the firmware load succeeded previously, the
>>>>>>>> caching behavior of the firmware code typically negates the
>>>>>>>> need to call the usermodehelper code again and the request
>>>>>>>> succeeds. If the firmware was never loaded because it isn't
>>>>>>>> actually present in the file system, this results in a call
>>>>>>>> to usermodehelper and a failure warning every resume. Rather
>>>>>>>> than have a WARN clogging up the kernel messages each time,
>>>>>>>> just drop the warn. There is still a dev_err for debugging
>>>>>>>> purposes.
>>>>>>>>
>>>>>>>> Signed-off-by: Laura Abbott <labbott@...oraproject.org>
>>>>>>>> ---
>>>>>>>> This might be papering over a real issue but I'm not
>>>>>>>> familiar enough with any of suspend/resume, bluetooth,
>>>>>>>> or firmware loading to identify an alternate fix.
>>>>>>>> The backtrace is from bcm patchram but the problem
>>>>>>>> isn't limited to that hardware. Intel also does a
>>>>>>>> request firmware and I was able to reproduce the
>>>>>>>> same backtrace on that driver by requesting non-existant
>>>>>>>> firmware file.
>>>>>>>
>>>>>>> so here is the thing with Bluetooth firmware. Some of them
>>>>>>> are  RAM patches to fix the ROM modules. Others are full firmware
>>>>>>> that are required to be downloaded first.
>>>>>>>
>>>>>>> For ROM modules, the RAM patching procedure is optional. So we
>>>>>>> will proceed even if no firmware is available. This means that
>>>>>>> the kernel will never cache it (since it is not there in the
>>>>>>> first place) and also on every resume we have the same issues.
>>>>>>> So optional firmware is something that happens for Bluetooth USB
>>>>>>> dongles a lot.
>>>>>>>
>>>>>>> In the driver we know which firmwares are optional and which are
>>>>>>> required. So we could tell the firmware class this if this would
>>>>>>> make
>>>>>>> things better and result in clearer errors and warnings. Is that
>>>>>>> something we want here?
>>>>>>>
>>>>>>
>>>>>> The response on another reply was
>>>>>>
>>>>>> "Yes, it is a driver problem, and loading firmware from filesystem
>>>>>> isn't safe during resume, and that is the purpose of the warning."
>>>>>>
>>>>>> It isn't clear if this means request_firmware shouldn't be called
>>>>>> on resume at all or if request_firmware shouldn't be called unless
>>>>>> we can guarantee it won't make a call into the file system. I'd
>>>>
>>>> If the firmware is cached before resume, it is ok to call
>>>> request_firmware()
>>>> during resume. Otherwise it will call filesystem and disks, which may
>>>> be a problem because the disk may not be ready for completing the
>>>> request during resume.
>>>>
>>>>>> be okay with adding another api (request_optional_firmware?) to
>>>>>> represent this if the firmware maintainers aren't against the
>>>>>> concept. If the firmware maintainers are against the concept,
>>>>>> it seems like the only solution is to rework the bluetooth drivers
>>>>>> to not request anything on resume.
>>>>
>>>> So do you just want to work around the warning by introducing a new
>>>> API?
>>>>
>>>>>
>>>>> I think request_optional_firmware concept sounds like an useful
>>>>> addition.
>>>>>
>>>>> However the problem here is that the driver does not know that it
>>>>> is called from resume path. It is easy to say that this is a driver
>>>>> problem, but the driver does not know it.
>>>>
>>>>  From USB stack view, one usb driver should know it is in the resume
>>>> path
>>>> because the root entry is the .resume() callback of the USB BT driver.
>>>
>>> have you actually read drivers/bluetooth/btusb.c before making these
>>> statements. I explained how and when request_firmware is actually
>>> called.
>>>
>>>>> If the hci_register_dev is called which in fact triggers
>>>>> hdev->setup to deal with vendor specific firmware path, then it
>>>>> means the driver just went through its probe() phase again. How
>>>>> would the driver differentiate this from any hot plug event. So to
>>>>> say this is a driver problem is just plain stupid. The driver does
>>>>> not know we are ending up in a reset_resume use case or when
>>>>> ACPI/BIOS decides to emulate an USB disconnect.
>>>
>>> The only time request_firmware is called is from hdev->setup(). And
>>> that can only be triggered by hci_register_dev(). Which is only
>>> called from probe() callback of the driver. This has nothing to do
>>> with the resume() callback.
>>
>> FYI, I once fixed a very similar problem in commit
>> 4320f6b1d9db4ca912c5eb6ecb328b2e090e1586
>>     PM / sleep: Fix request_firmware() error at resume
>>
>> But the bug seems still alive.  The stack trace looks pretty similar
>> as the original bug report.  Hmm.
>>
>> My fix above addressed the race between UMH lock changes in
>> thaw_process().  If the UMH lock still fails, it means that the
>> task was triggered even before the UMH protection in thaw_process().
>> That sounds weird, and might be a more serious problem.
>>
>> Added Rafael to Cc, in case I overlooked something obvious.
>>
>
> Thanks for pointing that out. It does look like a similar if not
> the same problem. In this case though we still haven't closed
> the race:
>
> [   35.725301] PM: resume of devices complete after 2370.964 msecs
> [   35.725363] Bluetooth: >>>> Begin testing line 2754 btusb.c
>
> [   35.725875] ------------[ cut here ]------------
> [   35.725879] WARNING: CPU: 7 PID: 2542 at
> drivers/base/firmware_class.c:1118 _request_firmware+0x557/0x810()
> [   35.725896] Modules linked in: ccm fuse cmac xt_CHECKSUM
> ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netbios_ns
> nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter
> ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6
> ip6table_mangle ip6table_security ip6table_raw ip6table_filter
> ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4
> nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnep
> arc4 iwlmvm snd_hda_codec_realtek snd_hda_codec_generic
> snd_hda_codec_hdmi mac80211 snd_hda_intel snd_hda_controller intel_rapl
> iosf_mbi snd_hda_codec x86_pkg_temp_thermal snd_hda_core iwlwifi
> coretemp kvm cfg80211 btusb btbcm btintel bluetooth snd_hwdep snd_seq
> uvcvideo snd_seq_device
> [   35.725909]  snd_pcm videobuf2_vmalloc videobuf2_core
> videobuf2_memops iTCO_wdt v4l2_common iTCO_vendor_support thinkpad_acpi
> videodev mei_me snd_timer rtsx_pci_ms ie31200_edac tpm_tis pcspkr
> serio_raw joydev memstick media edac_core snd tpm mei rfkill shpchp
> lpc_ich i2c_i801 soundcore wmi nfsd auth_rpcgss nfs_acl lockd grace
> sunrpc binfmt_misc dm_crypt i915 i2c_algo_bit drm_kms_helper drm
> rtsx_pci_sdmmc mmc_core e1000e crct10dif_pclmul crc32_pclmul
> crc32c_intel rtsx_pci ptp ghash_clmulni_intel pps_core mfd_core video
> [   35.725911] CPU: 7 PID: 2542 Comm: kworker/u17:0 Tainted: G        W
>        4.0.0-bt_hackery+ #36
> [   35.725911] Hardware name: LENOVO 20BFS0EC00/20BFS0EC00, BIOS
> GMET62WW (2.10 ) 03/19/2014
> [   35.725932] Workqueue: hci0 hci_power_on [bluetooth]
> [   35.725933]  0000000000000000 00000000923b240b ffff8803fc9d3b48
> ffffffff8178a76a
> [   35.725935]  0000000000000000 0000000000000000 ffff8803fc9d3b88
> ffffffff8109f5fa
> [   35.725936]  0000000000000000 ffffffffa05480a6 00000000fffffff5
> ffff88040934d8a0
> [   35.725936] Call Trace:
> [   35.725939]  [<ffffffff8178a76a>] dump_stack+0x45/0x57
> [   35.725940]  [<ffffffff8109f5fa>] warn_slowpath_common+0x8a/0xc0
> [   35.725943]  [<ffffffff8109f72a>] warn_slowpath_null+0x1a/0x20
> [   35.725944]  [<ffffffff814ed227>] _request_firmware+0x557/0x810
> [   35.725946]  [<ffffffff814e5253>] ? rpm_idle+0x23/0x230
> [   35.725948]  [<ffffffff814ed515>] request_firmware+0x35/0x50
> [   35.725951]  [<ffffffffa05443d8>] btusb_setup_intel+0x68/0x880 [btusb]
> [   35.725952]  [<ffffffff814e5306>] ? rpm_idle+0xd6/0x230
> [   35.725954]  [<ffffffff814e54ca>] ? __pm_runtime_idle+0x6a/0x80
> [   35.725960]  [<ffffffffa04aa3a1>] hci_dev_do_open+0xe1/0x580 [bluetooth]
> [   35.725962]  [<ffffffff810c8afd>] ?
> ttwu_do_activate.constprop.95+0x5d/0x70
> [   35.725967]  [<ffffffffa04ad440>] hci_power_on+0x40/0x200 [bluetooth]
> [   35.725969]  [<ffffffff810b7f9b>] process_one_work+0x14b/0x400
> [   35.725971]  [<ffffffff810b8a33>] worker_thread+0x53/0x490
> [   35.725972]  [<ffffffff810b89e0>] ? rescuer_thread+0x300/0x300
> [   35.725974]  [<ffffffff810bddd8>] kthread+0xd8/0xf0
> [   35.725976]  [<ffffffff810bdd00>] ? kthread_create_on_node+0x1b0/0x1b0
> [   35.725977]  [<ffffffff817911a2>] ret_from_fork+0x42/0x70
> [   35.725979]  [<ffffffff810bdd00>] ? kthread_create_on_node+0x1b0/0x1b0
> [   35.725980] ---[ end trace 27cd183538666cbc ]---
> [   35.725981] bluetooth hci0: firmware: THISISASTEST will not be loaded
> [   35.726100] PM: Finishing wakeup.
> [   35.726101] Restarting tasks ... done.
>
>  From this, it looks like usermodehelper still isn't set until after
> we've already tried to load the firmware. Even weirder, it looks like
> a thread is running before we've thawed the processes? Am I
> misunderstanding something or is the dmesg lying to me?
>

Okay I took another look and I noticed hci_register_dev calls
alloc_workqueue and then queue work to call the power on function.
Is it possible that the alloc_workqueue/queue_work is getting run
before thawing is complete? I don't have anymore time today to dig
into that.

Thanks,
Laura
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ