linux-kernel - Re: [PATCH v6 1/2] firmware_loader: use -ETIMEDOUT instead of -EAGAIN in fw_load_sysfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210722195924.oezxwv3u3p5k737l@garbanzo>
Date:   Thu, 22 Jul 2021 12:59:24 -0700
From:   Luis Chamberlain <mcgrof@...nel.org>
To:     Anirudh Rayabharam <mail@...rudhrb.com>
Cc:     gregkh@...uxfoundation.org, rafael@...nel.org,
        skhan@...uxfoundation.org, linux-kernel@...r.kernel.org,
        linux-kernel-mentees@...ts.linuxfoundation.org
Subject: Re: [PATCH v6 1/2] firmware_loader: use -ETIMEDOUT instead of
 -EAGAIN in fw_load_sysfs_fallback

On Thu, Jul 22, 2021 at 06:02:28PM +0530, Anirudh Rayabharam wrote:
> The only motivation for using -EAGAIN in commit 0542ad88fbdd81bb
> ("firmware loader: Fix _request_firmware_load() return val for fw load
> abort") was to distinguish the error from -ENOMEM, and so there is no
> real reason in keeping it. Keeping -ETIMEDOU is much telling of what the

Since you'll have to respin, a missing here   ^, also add that the
-ETIMEDOUT is what we'd get when we do time out on the wait, as its
not clear from the conext being changed.

> reason for a failure is, so just use that.
> 
> The rest is just trying to document a bit more of the motivations for the
> error codes, as otherwise we'd lose this information easily.

This is a separate change, and it actually does more than just that.
Moving code around should be done separately. The idea is to
first just remove the -EAGAIN so that the change is *easy* to review.
A remove of a return code *and* a move of code around makes it less
obvious for code review. And part of the comment is wrong now that we
removed -EAGAIN. When breaking patches up please review each change
going into each patch and consider if it makes sense, atomically.

> Suggested-by: Luis Chamberlain <mcgrof@...nel.org>
> Signed-off-by: Anirudh Rayabharam <mail@...rudhrb.com>
> ---
>  drivers/base/firmware_loader/fallback.c | 34 +++++++++++++++++--------
>  1 file changed, 24 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c
> index 91899d185e31..1db94165feaf 100644
> --- a/drivers/base/firmware_loader/fallback.c
> +++ b/drivers/base/firmware_loader/fallback.c
> @@ -70,7 +70,29 @@ static inline bool fw_sysfs_loading(struct fw_priv *fw_priv)
>  
>  static inline int fw_sysfs_wait_timeout(struct fw_priv *fw_priv,  long timeout)
>  {
> -	return __fw_state_wait_common(fw_priv, timeout);
> +	int ret = __fw_state_wait_common(fw_priv, timeout);
> +
> +	/*
> +	 * A signal could be sent to abort a wait. Consider Android's init
> +	 * gettting a SIGCHLD, which in turn was the same process issuing the
> +	 * sysfs store call for the fallback. In such cases we want to be able
> +	 * to tell apart in userspace when a signal caused a failure on the
> +	 * wait. In such cases we'd get -ERESTARTSYS.
> +	 *
> +	 * Likewise though another race can happen and abort the load earlier.

This comment is about the check for fw_load_abort() so since the move is
not going to happen when you remove -EAGAIN just leave it out. It can be
added once you do the move.

> +	 *
> +	 * In either case the situation is interrupted so we just inform
> +	 * userspace of that and we end things right away.

Be mindful that this is in context of both cases when re-writing the
patches.

> +	 *
> +	 * When we really time out just tell userspace it should try again,
> +	 * perhaps later.

That's the thing, we're getting rid of that -EAGAIN as it made no sense,
the goal was to just distinguish the error from -ENOMEM. That's it.
Since we are removing the -EAGAIN, this comment makes no sense as we
have clarified with Shuah that the goal of her patch was just to
distinguish the error.

So "tell userspace to try again" makes no sense since if a timeout
happened userspace can't really try again as we have aborted the whole
operation to allow firmware to be uploaded.

In fact, please add that to the commit log which removes the -EAGAIN,
something like:

"Using -EAGAIN is also not correct as this return code is typically used
to tell userspace to try something again, in this case re-using the
sysfs loading interface cannot be retried when a timeout happens, so
the return value is also bogus."

> +	 */
> +	if (ret == -ERESTARTSYS || fw_state_is_aborted(fw_priv))
> +		ret = -EINTR;
> +	else if (fw_priv->is_paged_buf && !fw_priv->data)
> +		ret = -ENOMEM;
> +
> +	return ret;
>  }
>  
>  struct fw_sysfs {
> @@ -526,20 +548,12 @@ static int fw_load_sysfs_fallback(struct fw_sysfs *fw_sysfs, long timeout)
>  	}
>  
>  	retval = fw_sysfs_wait_timeout(fw_priv, timeout);
> -	if (retval < 0 && retval != -ENOENT) {
> +	if (retval < 0) {
>  		mutex_lock(&fw_lock);
>  		fw_load_abort(fw_sysfs);
>  		mutex_unlock(&fw_lock);
>  	}
>  
> -	if (fw_state_is_aborted(fw_priv)) {
> -		if (retval == -ERESTARTSYS)
> -			retval = -EINTR;
> -		else
> -			retval = -EAGAIN;

All we want to do is remove this -EAGAIN line in one patch. We
don't want to move code to another place. We do this to make code
easier to review.

We preserve the error code from the wait when a signal did not interrupt
the process (-ERESTARTSYS), and so this can only be -ETIMEDOUT.

> -	} else if (fw_priv->is_paged_buf && !fw_priv->data)
> -		retval = -ENOMEM;
> -

Thanks for keeping up with the series!

  Luis