linux-kernel - Re: [PATCH 1/2] firmware, fix request_firmware

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 22 Oct 2013 10:35:37 +0800
From:	Ming Lei <ming.lei@...onical.com>
To:	Prarit Bhargava <prarit@...hat.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	x86@...nel.org,
	Andreas Herrmann <herrmann.der.user@...glemail.com>,
	tigran@...azian.fsnet.co.uk
Subject: Re: [PATCH 1/2] firmware, fix request_firmware_nowait() freeze with
 no uevent

On Tue, Oct 22, 2013 at 6:24 AM, Prarit Bhargava <prarit@...hat.com> wrote:
>
>
> On 10/21/2013 08:24 AM, Ming Lei wrote:
>> On Mon, Oct 21, 2013 at 5:35 AM, Prarit Bhargava <prarit@...hat.com> wrote:
>>> If request_firmware_nowait() is called with uevent == NULL, the firmware
>>> completion is never marked complete resulting in a hang in the process.
>>>
>>> If uevent is undefined, that means we're not waiting on anything and the
>>> process should just clean up and complete.  While we're at it, add a
>>> debug dev_dbg() to indicate that the FW has not been found.
>>>
>>> Signed-off-by: Prarit Bhargava <prarit@...hat.com>
>>> Cc: x86@...nel.org
>>> Cc: herrmann.der.user@...glemail.com
>>> Cc: ming.lei@...onical.com
>>> Cc: tigran@...azian.fsnet.co.uk
>>> ---
>>>  drivers/base/firmware_class.c |    6 +++++-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
>>> index 10a4467..95778dc 100644
>>> --- a/drivers/base/firmware_class.c
>>> +++ b/drivers/base/firmware_class.c
>>> @@ -335,7 +335,8 @@ static bool fw_get_filesystem_firmware(struct device *device,
>>>                 set_bit(FW_STATUS_DONE, &buf->status);
>>>                 complete_all(&buf->completion);
>>>                 mutex_unlock(&fw_lock);
>>> -       }
>>> +       } else
>>> +               dev_dbg(device, "firmware: %s not found\n", buf->fw_id);
>>>
>>>         return success;
>>>  }
>>> @@ -886,6 +887,9 @@ static int _request_firmware_load(struct firmware_priv *fw_priv, bool uevent,
>>>                         schedule_delayed_work(&fw_priv->timeout_work, timeout);
>>>
>>>                 kobject_uevent(&fw_priv->dev.kobj, KOBJ_ADD);
>>> +       } else {
>>> +               /* if there is no uevent then just cleanup */
>>> +               schedule_delayed_work(&fw_priv->timeout_work, 0);
>>>         }
>>
>> This may not a good idea and might break current NOHOTPLUG
>> users,
>
> Ming,
>
> The code is broken for all callers of request_firmware_nowait() with NOHOTPLUG
> and CONFIG_FW_LOADER_USER_HELPER=y.  AFAICT with the two existing cases of this
> usage in the kernel, both are broken and both are attempting to do the same
> thing that I'm doing in the x86 microcode ATM.
>
> This is the situation as I understand it and please correct me if I'm wrong
> about the execution path.  If I call request_firmware_nowait() with NOHOTPLUG I
> am essentially saying that there is no uevent associated with this firmware
> load; that is uevent = 0.  request_firmware_work_func() is called as scheduled
> task, which results in a call to _request_firmware().  _request_firmware() first
> calls _request_firmware_prepare() which eventually results in a call to
> __allocate_fw_buf() which does an init_completion(&buf->completion).
>
> Returning back up the stack to _request_firmware() we eventually call
> fw_get_filesystem_firmware().  _If the firmware does not exist_ success is false
> and the if (success) loop is not executed, and it is important to note that the
> complete_all(&buf->completion) is _not_ called.  fw_get_filesystem_firmware()
> returns an error so that fw_load_from_user_helper() is called from
> _request_firmware().
>
> fw_load_from_user_helper() eventually calls _request_firmware_load() and this is
> where we get into a problem.  fw_load_from_user_helper() calls all the file
> creation, etc., and then hits this chunk of code:
>
>         if (uevent) {
>                 dev_set_uevent_suppress(f_dev, false);
>                 dev_dbg(f_dev, "firmware: requesting %s\n", buf->fw_id);
>                 if (timeout != MAX_SCHEDULE_TIMEOUT)
>                         schedule_delayed_work(&fw_priv->timeout_work, timeout);
>
>                 kobject_uevent(&fw_priv->dev.kobj, KOBJ_ADD);
>         }
>
>         wait_for_completion(&buf->completion);
>
> As I previously said, we've been called with NOHOTPLUG, ie) uevent = 0.  That
> means we skip down to the wait_for_completion(&buf->completion) ... and we wait
> ... forever.

Yes, it is exactly the previous design on NOHOTPLUG, because
firmware loader has to wait for the handling from user space, and
no one can predict when userspace comes because of no
notification. For example, the userspace may be 'some inputting
from shell by someone once he is free', :-) so it is difficult to set a
timeout explicitly for the handling.

But the requests can be killed before suspend & shutdown, so
it is still OK.

That is why NOHOTPLUG isn't encouraged to be taken, actually
I don't suggest you to do that too, :-)

You need to make sure your approach won't break micro-code
update application in current/previous distributions.

>
> I can reproduce this by using a Dell PE 1850 & the dell_rbu module by doing the
> following:
>
> insmod dell_rbu.ko
> echo init > /sys/devices/platform/dell_rbu/image_type
> lsmod | grep dell_rbu
>
> (after an hour)
>
> [root@...l-pe1850-04 dell_rbu]# lsmod | grep dell_rbu
> dell_rbu               14315  1
> [root@...l-pe1850-04 dell_rbu]#
>
> ^^^ that use count is left because the thread is waiting with an existing module
> ref count.  For kicks I put a printk in the dell_rbu code or instrument the
> _request_firmware() code and did a reboot.  Since the completions are finished
> on system shutdown, I see the code continue to execute at the end of boot.

Right, so no obvious problem from user view, isn't it?

>
>> and how can you make sure the user space application can
>> complete the request during the timeout time?
>
> I see that your question really comes down to "are there additional
> synchronizations needed in the two drivers that already call the code this way?"
>  I realize that the answer to that is yes and I'll fix those up in a v2.  It
> should be trivial to make those changes AFAICT.  I've introduced some additional
> synchronization via a completion in the x86 microcode and will likely have to do
> something similar in the other drivers ... although it may be easier to just
> have the firmware code do all the synchronization.  I'll look into it.
>
> Hope this explains things a bit better,

As I said above, setting a timeout may be not ok, and may break
current two drivers.


Thanks,
--
Ming Lei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/