linux-kernel - Re: [PATCH v3] firmware_loader: fix use-after-free in firmware_fallback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e12bb24-9bb8-dad7-d852-02173ce41b6a@linuxfoundation.org>
Date:   Wed, 14 Apr 2021 09:26:55 -0600
From:   Shuah Khan <skhan@...uxfoundation.org>
To:     Luis Chamberlain <mcgrof@...nel.org>,
        Anirudh Rayabharam <mail@...rudhrb.com>,
        Shuah Khan <shuah@...nel.org>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Junyong Sun <sunjy516@...il.com>,
        syzbot+de271708674e2093097b@...kaller.appspotmail.com,
        linux-kernel@...r.kernel.org,
        Shuah Khan <skhan@...uxfoundation.org>
Subject: Re: [PATCH v3] firmware_loader: fix use-after-free in
 firmware_fallback_sysfs

On 4/14/21 6:55 AM, Luis Chamberlain wrote:
> Shuah, a question for you toward the end here.
> 
> On Wed, Apr 14, 2021 at 02:24:05PM +0530, Anirudh Rayabharam wrote:
>> This use-after-free happens when a fw_priv object has been freed but
>> hasn't been removed from the pending list (pending_fw_head). The next
>> time fw_load_sysfs_fallback tries to insert into the list, it ends up
>> accessing the pending_list member of the previoiusly freed fw_priv.
>>
>> The root cause here is that all code paths that abort the fw load
>> don't delete it from the pending list. For example:
>>
>> 	_request_firmware()
>> 	  -> fw_abort_batch_reqs()
>> 	      -> fw_state_aborted()
>>
>> To fix this, delete the fw_priv from the list in __fw_set_state() if
>> the new state is DONE or ABORTED. This way, all aborts will remove
>> the fw_priv from the list. Accordingly, remove calls to list_del_init
>> that were being made before calling fw_state_(aborted|done).
>>
>> Also, in fw_load_sysfs_fallback, don't add the fw_priv to the pending
>> list if it is already aborted. Instead, just jump out and return early.
>>
>> Fixes: bcfbd3523f3c ("firmware: fix a double abort case with fw_load_sysfs_fallback")
>> Reported-by: syzbot+de271708674e2093097b@...kaller.appspotmail.com
>> Tested-by: syzbot+de271708674e2093097b@...kaller.appspotmail.com
>> Signed-off-by: Anirudh Rayabharam <mail@...rudhrb.com>
>> ---
>>
>> Changes in v3:
>> Modified the patch to incorporate suggestions by Luis Chamberlain in
>> order to fix the root cause instead of applying a "band-aid" kind of
>> fix.
>> https://lore.kernel.org/lkml/20210403013143.GV4332@42.do-not-panic.com/
>>
>> Changes in v2:
>> 1. Fixed 1 error and 1 warning (in the commit message) reported by
>> checkpatch.pl. The error was regarding the format for referring to
>> another commit "commit <sha> ("oneline")". The warning was for line
>> longer than 75 chars.
>>
>> ---
>>   drivers/base/firmware_loader/fallback.c | 8 ++++++--
>>   drivers/base/firmware_loader/firmware.h | 6 +++++-
>>   drivers/base/firmware_loader/main.c     | 2 ++
>>   3 files changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c
>> index 91899d185e31..73581b6998b4 100644
>> --- a/drivers/base/firmware_loader/fallback.c
>> +++ b/drivers/base/firmware_loader/fallback.c
>> @@ -94,7 +94,6 @@ static void __fw_load_abort(struct fw_priv *fw_priv)
>>   	if (fw_sysfs_done(fw_priv))
>>   		return;
>>   
>> -	list_del_init(&fw_priv->pending_list);
>>   	fw_state_aborted(fw_priv);
>>   }
>>   
>> @@ -280,7 +279,6 @@ static ssize_t firmware_loading_store(struct device *dev,
>>   			 * Same logic as fw_load_abort, only the DONE bit
>>   			 * is ignored and we set ABORT only on failure.
>>   			 */
>> -			list_del_init(&fw_priv->pending_list);
>>   			if (rc) {
>>   				fw_state_aborted(fw_priv);
>>   				written = rc;
>> @@ -513,6 +511,11 @@ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout)
>>   	}
>>   
>>   	mutex_lock(&fw_lock);
>> +	if (fw_state_is_aborted(fw_priv)) {
>> +		mutex_unlock(&fw_lock);
>> +		retval = -EAGAIN;
>> +		goto out;
>> +	}
> 
> Thanks for the quick follow up!
> 
> This would regress commit 76098b36b5db1 ("firmware: send -EINTR on
> signal abort on fallback mechanism") which I had mentioned in my follow
> up email you posted a link to. It would regress it since the condition
> is just being met earlier and you nullify the effort. So essentially
> on Android you would make not being able to detect signal handlers
> like the SIGCHLD signal sent to init, if init was the same process
> dealing with the sysfs fallback firmware upload.
> 
> The way I dealt with this in my patch was I decided to return -EINTR
> in the earlier case in the hunk you added, instead of -EAGAIN. In
> addition to this, later on fw_load_sysfs_fallback() when
> fw_sysfs_wait_timeout() is used that would also deal with checking
> for error codes on wait, and only then check if it was a signal
> that cancelled things (the check for -ERESTARTSYS). We therefore
> only send to userspace -EAGAIN when the wait really did hit the
> timeout.
> 
> But also note that my change added a check for
> fw_state_is_aborted(fw_priv) inside fw_sysfs_wait_timeout(),
> as that was a recently intended goal.
> 
> In either case I documented well *why* we do these error checks
> before sending a code to userspace on fw_sysfs_wait_timeout() since
> otherwise it would be easy to regress that code, so please also
> document that as I did.
> 
> I'll re-iterate again also:
> 
> 	Shuah's commit 0542ad88fbdd81bb ("firmware loader: Fix
> 	_request_firmware_load() return val for fw load abort") also wanted to
> 	distinguish the timeout vs -ENOMEM, but for some reason in the timeout
> 	case -EAGAIN was being sent back to userspace. I am no longer sure if
> 	that is a good idea, but since we started doing that at some point I
> 	guess we want to keep that behaviour.
> 
> Shuah, can you think of any reason to retain -EAGAIN other than you
> introduced it here? If there's no real good reason I think it can
> simplify the error handling here. But, we *would* change what we do
> to userspace... and for that reason we may have to live with it.
> 

As I recall the reason for this patch was to be able to differentiate
between timing out vs no memory case when driver was attempting to
load firmware. I wish I added why to the change log.

The code seems to have changed a lot since my commit. I will take a look
at the closely and let you know if this is still necessary late on
today.

thanks,
-- Shuah