[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a31fc55f-1f61-19b6-dfaf-4454380f8f45@linuxfoundation.org>
Date: Fri, 23 Apr 2021 12:44:53 -0600
From: Shuah Khan <skhan@...uxfoundation.org>
To: Luis Chamberlain <mcgrof@...nel.org>,
Anirudh Rayabharam <mail@...rudhrb.com>,
Shuah Khan <shuah@...nel.org>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Junyong Sun <sunjy516@...il.com>,
syzbot+de271708674e2093097b@...kaller.appspotmail.com,
linux-kernel@...r.kernel.org,
Shuah Khan <skhan@...uxfoundation.org>
Subject: Re: [PATCH v3] firmware_loader: fix use-after-free in
firmware_fallback_sysfs
On 4/15/21 8:10 AM, Shuah Khan wrote:
> On 4/14/21 9:26 AM, Shuah Khan wrote:
>> On 4/14/21 6:55 AM, Luis Chamberlain wrote:
>>> Shuah, a question for you toward the end here.
>>>
>>> On Wed, Apr 14, 2021 at 02:24:05PM +0530, Anirudh Rayabharam wrote:
>>>> This use-after-free happens when a fw_priv object has been freed but
>>>> hasn't been removed from the pending list (pending_fw_head). The next
>>>> time fw_load_sysfs_fallback tries to insert into the list, it ends up
>>>> accessing the pending_list member of the previoiusly freed fw_priv.
>>>>
>>>> The root cause here is that all code paths that abort the fw load
>>>> don't delete it from the pending list. For example:
>>>>
>>>> _request_firmware()
>>>> -> fw_abort_batch_reqs()
>>>> -> fw_state_aborted()
>>>>
>>>> To fix this, delete the fw_priv from the list in __fw_set_state() if
>>>> the new state is DONE or ABORTED. This way, all aborts will remove
>>>> the fw_priv from the list. Accordingly, remove calls to list_del_init
>>>> that were being made before calling fw_state_(aborted|done).
>>>>
>>>> Also, in fw_load_sysfs_fallback, don't add the fw_priv to the pending
>>>> list if it is already aborted. Instead, just jump out and return early.
>>>>
>>>> Fixes: bcfbd3523f3c ("firmware: fix a double abort case with
>>>> fw_load_sysfs_fallback")
>>>> Reported-by: syzbot+de271708674e2093097b@...kaller.appspotmail.com
>>>> Tested-by: syzbot+de271708674e2093097b@...kaller.appspotmail.com
>>>> Signed-off-by: Anirudh Rayabharam <mail@...rudhrb.com>
>>>> ---
>>>>
>>>> Changes in v3:
>>>> Modified the patch to incorporate suggestions by Luis Chamberlain in
>>>> order to fix the root cause instead of applying a "band-aid" kind of
>>>> fix.
>>>> https://lore.kernel.org/lkml/20210403013143.GV4332@42.do-not-panic.com/
>>>>
>>>> Changes in v2:
>>>> 1. Fixed 1 error and 1 warning (in the commit message) reported by
>>>> checkpatch.pl. The error was regarding the format for referring to
>>>> another commit "commit <sha> ("oneline")". The warning was for line
>>>> longer than 75 chars.
>>>>
>>>> ---
>>>> drivers/base/firmware_loader/fallback.c | 8 ++++++--
>>>> drivers/base/firmware_loader/firmware.h | 6 +++++-
>>>> drivers/base/firmware_loader/main.c | 2 ++
>>>> 3 files changed, 13 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/base/firmware_loader/fallback.c
>>>> b/drivers/base/firmware_loader/fallback.c
>>>> index 91899d185e31..73581b6998b4 100644
>>>> --- a/drivers/base/firmware_loader/fallback.c
>>>> +++ b/drivers/base/firmware_loader/fallback.c
>>>> @@ -94,7 +94,6 @@ static void __fw_load_abort(struct fw_priv *fw_priv)
>>>> if (fw_sysfs_done(fw_priv))
>>>> return;
>>>> - list_del_init(&fw_priv->pending_list);
>>>> fw_state_aborted(fw_priv);
>>>> }
>>>> @@ -280,7 +279,6 @@ static ssize_t firmware_loading_store(struct
>>>> device *dev,
>>>> * Same logic as fw_load_abort, only the DONE bit
>>>> * is ignored and we set ABORT only on failure.
>>>> */
>>>> - list_del_init(&fw_priv->pending_list);
>>>> if (rc) {
>>>> fw_state_aborted(fw_priv);
>>>> written = rc;
>>>> @@ -513,6 +511,11 @@ static int fw_load_sysfs_fallback(struct
>>>> fw_sysfs *fw_sysfs, long timeout)
>>>> }
>>>> mutex_lock(&fw_lock);
>>>> + if (fw_state_is_aborted(fw_priv)) {
>>>> + mutex_unlock(&fw_lock);
>>>> + retval = -EAGAIN;
>>>> + goto out;
>>>> + }
>>>
>>> Thanks for the quick follow up!
>>>
>>> This would regress commit 76098b36b5db1 ("firmware: send -EINTR on
>>> signal abort on fallback mechanism") which I had mentioned in my follow
>>> up email you posted a link to. It would regress it since the condition
>>> is just being met earlier and you nullify the effort. So essentially
>>> on Android you would make not being able to detect signal handlers
>>> like the SIGCHLD signal sent to init, if init was the same process
>>> dealing with the sysfs fallback firmware upload.
>>>
>>> The way I dealt with this in my patch was I decided to return -EINTR
>>> in the earlier case in the hunk you added, instead of -EAGAIN. In
>>> addition to this, later on fw_load_sysfs_fallback() when
>>> fw_sysfs_wait_timeout() is used that would also deal with checking
>>> for error codes on wait, and only then check if it was a signal
>>> that cancelled things (the check for -ERESTARTSYS). We therefore
>>> only send to userspace -EAGAIN when the wait really did hit the
>>> timeout.
>>>
>>> But also note that my change added a check for
>>> fw_state_is_aborted(fw_priv) inside fw_sysfs_wait_timeout(),
>>> as that was a recently intended goal.
>>>
>>> In either case I documented well *why* we do these error checks
>>> before sending a code to userspace on fw_sysfs_wait_timeout() since
>>> otherwise it would be easy to regress that code, so please also
>>> document that as I did.
>>>
>>> I'll re-iterate again also:
>>>
>>> Shuah's commit 0542ad88fbdd81bb ("firmware loader: Fix
>>> _request_firmware_load() return val for fw load abort") also
>>> wanted to
>>> distinguish the timeout vs -ENOMEM, but for some reason in the
>>> timeout
>>> case -EAGAIN was being sent back to userspace. I am no longer
>>> sure if
>>> that is a good idea, but since we started doing that at some point I
>>> guess we want to keep that behaviour.
>>>
>>> Shuah, can you think of any reason to retain -EAGAIN other than you
>>> introduced it here? If there's no real good reason I think it can
>>> simplify the error handling here. But, we *would* change what we do
>>> to userspace... and for that reason we may have to live with it.
>>>
>>
>> As I recall the reason for this patch was to be able to differentiate
>> between timing out vs no memory case when driver was attempting to
>> load firmware. I wish I added why to the change log.
>>
>> The code seems to have changed a lot since my commit. I will take a look
>> at the closely and let you know if this is still necessary late on
>> today.
>>
>
> Luis,
>
> I did some digging and figured out why I added this timeout logic. Media
> drivers expect request_firmware() timeout or fail.
>
> It turns out I have notes saved on this problem.
>
> When drivers attempt to load firmware while ext_fs is going through
> recovery during boot or resume (after hibernate or suspend),
> request_firmware() calls fail without this timeout handling leaving
> drivers and devices that need firmware loaded in failed state.
>
> I added this timeout handling so drivers can retry loading the firmware.
> Several media drivers retry based on the timeoout return. This helps
> differentiate timeout and other failures.
>
> We still need this logic or at least we can't delete without verifying
> that is indeed not needed.
>
Doesn't look like this manded in your Inbox?
thanks,
-- Shuah
Powered by blists - more mailing lists