linux-kernel - Re: hid-related 5.2-rc1 boot hang

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ea7b2dee-15a6-9b52-fbff-558c36cd72df@redhat.com>
Date:   Tue, 4 Jun 2019 12:50:18 +0200
From:   Hans de Goede <hdegoede@...hat.com>
To:     Benjamin Tissoires <benjamin.tissoires@...hat.com>
Cc:     Jiri Kosina <jikos@...nel.org>,
        Dave Hansen <dave.hansen@...el.com>,
        "open list:HID CORE LAYER" <linux-input@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: hid-related 5.2-rc1 boot hang

Hi,

On 04-06-19 12:08, Benjamin Tissoires wrote:
> On Tue, Jun 4, 2019 at 9:51 AM Benjamin Tissoires
> <benjamin.tissoires@...hat.com> wrote:
>>
>> On Mon, Jun 3, 2019 at 4:17 PM Hans de Goede <hdegoede@...hat.com> wrote:
>>>
>>> Hi,
>>>
>>> On 03-06-19 15:55, Benjamin Tissoires wrote:
>>>> On Mon, Jun 3, 2019 at 11:51 AM Hans de Goede <hdegoede@...hat.com> wrote:
>>>>>
>>>>> Hi Again,
>>>>>
>>>>> On 03-06-19 11:11, Hans de Goede wrote:
>>>>> <snip>
>>>>>
>>>>>>> not sure about the rest of logitech issues yet) next week.
>>>>>>
>>>>>> The main problem seems to be the request_module patches. Although I also
>>>>
>>>> Can't we use request_module_nowait() instead, and set a reasonable
>>>> timeout that we detect only once to check if userspace is compatible:
>>>>
>>>> In pseudo-code:
>>>> if (!request_module_checked) {
>>>>     request_module_nowait(name);
>>>>     use_request_module = wait_event_timeout(wq,
>>>>           first_module_loaded, 10 seconds in jiffies);
>>>>     request_module_checked = true;
>>>> } else if (use_request_module) {
>>>>     request_module(name);
>>>> }
>>>
>>> Well looking at the just attached dmesg , the modprobe
>>> when triggered by udev from userspace succeeds in about
>>> 0.5 seconds, so it seems that the modprobe hangs happens
>>> when called from within the kernel rather then from within
>>> userspace.
>>>
>>> What I do not know if is the hang is inside userspace, or
>>> maybe it happens when modprobe calls back into the kernel,
>>> if the hang happens when modprobe calls back into the kernel,
>>> then other modprobes (done from udev) likely will hang too
>>> since I think only 1 modprobe can happen at a time.
>>>
>>> I really wish we knew what distinguished working systems
>>> from non working systems :|
>>>
>>> I cannot find a common denominator; other then the systems
>>> are not running Fedora. So far we've reports from both Ubuntu 16.04
>>> and Tumbleweed, so software version wise these 2 are wide apart.
>>
>> I am trying to reproduce the lock locally, and installed an opensuse
>> Tumbleweed in a VM. When forwarding a Unifying receiver to the VM, I
>> do not see the lock with either my vanilla compiled kernel and the rpm
>> found in http://download.opensuse.org/repositories/Kernel:/HEAD/standard/x86_64/
>>
>> Next step is install Tumbleweed on bare metal, but I do not see how
>> this could introduce a difference (maybe USB2 vs 3).
> 
> Making progress here.
> 
> The difference between Ubuntu/Tumbleweed and Fedora: usbhid is shipped
> as a module while in Fedora usbhid is included in the kernel.
> 
> If I rmmod hid_* and usbhid, then modprobe usbhid, the command hangs
> for 3 minutes.
> If usbhid is already loaded, inserting a receiver is immediate
> regarding the loading of the external modules.
> 
> So my assumption is that when the device gets detected at boot, usbhid
> gets loaded by the kernel event, which in turns attempts to call
> __request_module, but the modprobe can't be fulfilled because it's
> already waiting for the initial usbhid modprobe to finish.
> 
> Still don't know how to solve that, but I thought I should share.

Hmm, we may be hitting the scenario described in the big comment
around line 3500 of kernel/module.c.

But I'm not sure that is what is happening here.

Maybe you can put a WARN_ON(1) in request_module and look at the
backtrace ?  That may help to figure out what is going on; or
alternatively it might help to find some way to detect this and
if it happens skip the request_module...

Regards,

Hans