linux-kernel - Re: [PATCH v4 1/3] mtd: nand: increase ready wait timeout and report timeouts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150906203726.GA7745@laptop.cereza>
Date:	Sun, 6 Sep 2015 17:37:26 -0300
From:	Ezequiel Garcia <ezequiel@...guardiasur.com.ar>
To:	Alex Smith <alex.smith@...tec.com>
Cc:	linux-mtd@...ts.infradead.org,
	Brian Norris <computersforpeace@...il.com>,
	David Woodhouse <dwmw2@...radead.org>,
	Zubair Lutfullah Kakakhel <Zubair.Kakakhel@...tec.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 1/3] mtd: nand: increase ready wait timeout and report
 timeouts

On 27 Jul 02:50 PM, Alex Smith wrote:
> If nand_wait_ready() times out, this is silently ignored, and its
> caller will then proceed to read from/write to the chip before it is
> ready. This can potentially result in corruption with no indication as
> to why.
> 
> While a 20ms timeout seems like it should be plenty enough, certain
> behaviour can cause it to timeout much earlier than expected. The
> situation which prompted this change was that CPU 0, which is
> responsible for updating jiffies, was holding interrupts disabled
> for a fairly long time while writing to the console during a printk,
> causing several jiffies updates to be delayed. If CPU 1 happens to
> enter the timeout loop in nand_wait_ready() just before CPU 0 re-
> enables interrupts and updates jiffies, CPU 1 will immediately time
> out when the delayed jiffies updates are made. The result of this is
> that nand_wait_ready() actually waits less time than the NAND chip
> would normally take to be ready, and then read_page() proceeds to
> read out bad data from the chip.
> 
> The situation described above may seem unlikely, but in fact it can be
> reproduced almost every boot on the MIPS Creator Ci20.
> 

Not only unlikely but scary :) BTW, can't find SMP patches for Ci20,
are you sure this behavior will apply once SMP is upstreamed?

> Debugging this was made more difficult by the misleading comment above
> nand_wait_ready() stating "The timeout is caught later" - no timeout
> was ever reported, leading me away from the real source of the problem.
> 
> Therefore, this patch increases the timeout to 200ms. This should be
> enough to cover cases where jiffies updates get delayed. Additionally,
> add a pr_warn() when a timeout does occur so that it is easier to
> pinpoint any problems in future caused by the chip not becoming ready.
> 
> Signed-off-by: Alex Smith <alex.smith@...tec.com>
> Cc: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@...tec.com>
> Cc: David Woodhouse <dwmw2@...radead.org>
> Cc: Brian Norris <computersforpeace@...il.com>
> Cc: linux-mtd@...ts.infradead.org
> Cc: linux-kernel@...r.kernel.org
> ---
> v3 -> v4:
>  - New patch to fix issue encountered in external Ci20 3.18 kernel
>    branch which also applies upstream.
> ---
>  drivers/mtd/nand/nand_base.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/mtd/nand/nand_base.c b/drivers/mtd/nand/nand_base.c
> index ceb68ca8277a..a0dab3414f16 100644
> --- a/drivers/mtd/nand/nand_base.c
> +++ b/drivers/mtd/nand/nand_base.c
> @@ -543,23 +543,32 @@ static void panic_nand_wait_ready(struct mtd_info *mtd, unsigned long timeo)
>  	}
>  }
>  
> -/* Wait for the ready pin, after a command. The timeout is caught later. */
> +/**
> + * nand_wait_ready - [GENERIC] Wait for the ready pin after commands.
> + * @mtd: MTD device structure
> + *
> + * Wait for the ready pin after a command, and warn if a timeout occurs.
> + */
>  void nand_wait_ready(struct mtd_info *mtd)
>  {
>  	struct nand_chip *chip = mtd->priv;
> -	unsigned long timeo = jiffies + msecs_to_jiffies(20);
> +	unsigned long timeo = jiffies + msecs_to_jiffies(200);
>  
>  	/* 400ms timeout */
>  	if (in_interrupt() || oops_in_progress)
>  		return panic_nand_wait_ready(mtd, 400);
>  
>  	led_trigger_event(nand_led_trigger, LED_FULL);
> +

Spurious change here.

>  	/* Wait until command is processed or timeout occurs */
>  	do {
>  		if (chip->dev_ready(mtd))
> -			break;
> +			goto out;
>  		touch_softlockup_watchdog();
>  	} while (time_before(jiffies, timeo));
> +
> +	pr_warn("timeout while waiting for chip to become ready\n");
> +out:
>  	led_trigger_event(nand_led_trigger, LED_OFF);
>  }

This change looks reasonable, a timeout value should be large enough
to be confident the operation has _really_ timed out. On non-error
path, this change shouldn't make any difference.

And the warning is probably helpful too, so:

Reviewed-by: Ezequiel Garcia <ezequiel@...guardiasur.com.ar>
-- 
Ezequiel Garcia, VanguardiaSur
www.vanguardiasur.com.ar
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/