linux-kernel - Re: [RFC PATCH] mtd: rawnand: use mutex to protect access while in suspend

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211004104147.579f3b01@collabora.com>
Date:   Mon, 4 Oct 2021 10:41:47 +0200
From:   Boris Brezillon <boris.brezillon@...labora.com>
To:     Sean Nyekjaer <sean@...nix.com>
Cc:     Miquel Raynal <miquel.raynal@...tlin.com>,
        Richard Weinberger <richard@....at>,
        Vignesh Raghavendra <vigneshr@...com>,
        Boris Brezillon <bbrezillon@...nel.org>,
        linux-mtd@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] mtd: rawnand: use mutex to protect access while in
 suspend

On Mon,  4 Oct 2021 08:56:09 +0200
Sean Nyekjaer <sean@...nix.com> wrote:

> This will prevent nand_get_device() from returning -EBUSY.
> It will force mtd_write()/mtd_read() to wait for the nand_resume() to unlock
> access to the mtd device.
> 
> Then we avoid -EBUSY is returned to ubifsi via mtd_write()/mtd_read(),
> that will in turn hard error on every error returened.
> We have seen during ubifs tries to call mtd_write before the mtd device
> is resumed.

I think the problem is here. Why would UBIFS/UBI try to write something
to a device that's not resumed yet (or has been suspended already, if
you hit this in the suspend path).

> 
> Exec_op[0] speed things up, so we see this race when the device is
> resuming. But it's actually "mtd: rawnand: Simplify the locking" that
> allows it to return -EBUSY, before that commit it would have waited for
> the mtd device to resume.

Uh, wait. If nand_resume() was called before any writes/reads this
wouldn't happen. IMHO, the problem is not that we return -EBUSY without
blocking, the problem is that someone issues a write/read before calling
mtd_resume().

> 
> Tested on a iMX6ULL.
> 
> [0]:
> ef347c0cfd61 ("mtd: rawnand: gpmi: Implement exec_op")
> 
> Fixes: 013e6292aaf5 ("mtd: rawnand: Simplify the locking")
> Signed-off-by: Sean Nyekjaer <sean@...nix.com>
> ---
> 
> I did this a RFC as we probably will need to remove the suspended
> variable as it's kinda made obsolute by this change.
> Should we introduce a new mutex? Or maybe a spin_lock?
> 
>  drivers/mtd/nand/raw/nand_base.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
> index 3d6c6e880520..0ea343404cac 100644
> --- a/drivers/mtd/nand/raw/nand_base.c
> +++ b/drivers/mtd/nand/raw/nand_base.c
> @@ -4567,7 +4567,6 @@ static int nand_suspend(struct mtd_info *mtd)
>  		ret = chip->ops.suspend(chip);
>  	if (!ret)
>  		chip->suspended = 1;
> -	mutex_unlock(&chip->lock);

Hm, I'm not sure keeping the lock when you're in a suspended state
is a good idea. It just papers over another bug IMO (see above).

>  
>  	return ret;
>  }
> @@ -4580,7 +4579,6 @@ static void nand_resume(struct mtd_info *mtd)
>  {
>  	struct nand_chip *chip = mtd_to_nand(mtd);
>  
> -	mutex_lock(&chip->lock);
>  	if (chip->suspended) {
>  		if (chip->ops.resume)
>  			chip->ops.resume(chip);