linux-kernel - Re: [PATCHv2] firmware: Correct handling of fw_state_wait

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAJpBn1zg7AX9v93dtMpQyvip9zwUk+aAKU8U6bAaYP7gu-+bdA@mail.gmail.com>
Date:   Tue, 17 Jan 2017 10:04:20 -0800
From:   Jakub Kicinski <jakub.kicinski@...ronome.com>
To:     "Luis R. Rodriguez" <mcgrof@...nel.org>
Cc:     Chris Wilson <chris@...is-wilson.co.uk>,
        linux-kernel-dev@...khoff.com,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Bjorn Andersson <bjorn.andersson@...aro.org>,
        Daniel Wagner <daniel.wagner@...-carit.de>,
        Ming Lei <ming.lei@...onical.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        oss-drivers@...ronome.com
Subject: Re: [PATCHv2] firmware: Correct handling of fw_state_wait_timeout()
 return value

On Tue, Jan 17, 2017 at 9:30 AM, Luis R. Rodriguez <mcgrof@...nel.org> wrote:
> On Tue, Jan 17, 2017 at 08:30:37AM -0800, Jakub Kicinski wrote:
>> On Tue, Jan 17, 2017 at 8:21 AM, Luis R. Rodriguez <mcgrof@...nel.org> wrote:
>> >>>
>> >>>       retval = fw_state_wait_timeout(&buf->fw_st, timeout);
>> >>> -     if (retval < 0) {
>> >>> +     if (retval == -ETIMEDOUT || retval == -ERESTARTSYS) {
>> >>>               mutex_lock(&fw_lock);
>> >>>               fw_load_abort(fw_priv);
>> >>>               mutex_unlock(&fw_lock);
>> >>
>> >> This is a bit messy, two other similar issues were reported before
>> >> and upon review I suggested Patrick Bruenn's fix with a better commit
>> >> log seems best fit. Patrick sent a patch Jan 4, 2017 but never followed up
>> >> despite my feedback on a small change on the commit log message [0]. Can you
>> >> try that and if that fixes it can you adjust the commit log accordingly? Please
>> >> note the preferred solution would be:
>> >>
>> >> diff --git a/drivers/base/firmware_class.c b/drivers/base/firmware_class.c
>> >> index b9ac348e8d33..c530f8b4af01 100644
>> >> --- a/drivers/base/firmware_class.c
>> >> +++ b/drivers/base/firmware_class.c
>> >> @@ -542,6 +542,8 @@ static struct firmware_priv *to_firmware_priv(struct device *dev)
>> >>
>> >>  static void __fw_load_abort(struct firmware_buf *buf)
>> >>  {
>> >> +       if (!buf)
>> >> +               return;
>>
>> Allow me to try to persuade you one last time :)  My patch makes the
>> code more logical and easier to follow.  The code says:
>> in case no wake up happened - finish the wait (otherwise the waking
>> thread finishes it).
>
> Your patch is still wrong, as Patrick great commit log notes a null defer
> can also happen on a race with a case of -1 being sent and a -ENOENT error,
> so we'd have to adjust for when __fw_state_wait_common() returns also
> -ENOENT.

Sorry, I don't follow.  _Not_ calling abort on -ENOENT error is
exactly what my patch does.

>> Adding a NULL-check would just paper over the
>> issue and can cause trouble down the line.
>
> We typically bail on errors and use similar code to bail out, and we
> typically do these things. Here its no different. The *real* issue
> is the fact that we have a waiting timeout which can fail race against
> a user imposed error out on the sysfs interface. There is one catch:
>
> We already lock with the big fw_lock and use this to be able to check
> for the status of the fw, so once aborted we technically should not have
> to abort again. A proper way to address then this would have been to check
> for the status of the fw prior to aborting again given we also lock on the
> big fw_lock. A problem with this though is the status is part of the buf
> which is set to NULL after we are done aborting.

Yes, I've seen that too :\  This race seems to have been there prior
to 4.9, though.  I guess we could fix both issues with the NULL-check
although I would prefer if we had both patches.

FWIW I think the NULL-check could be put in the existing conditional:

         * There is a small window in which user can write to 'loading'
         * between loading done and disappearance of 'loading'
         */
-       if (fw_state_is_done(&buf->fw_st))
+       if (!buf || fw_state_is_done(&buf->fw_st))
                return;

        list_del_init(&buf->pending_list);

Note that the comment above seems to be mentioning the race we're
trying to solve.