linux-kernel - Re: [RESEND RESEND RESEND PATCH v2] mtd: nand_bbt: scan for next free bbt block if writing bbt fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Fri, 04 Sep 2015 16:20:15 -0500
From:	Xander Huff <xander.huff@...com>
To:	Brian Norris <computersforpeace@...il.com>,
	"Bean Huo 霍斌斌 (beanhuo)" <beanhuo@...ron.com>
CC:	"dwmw2@...radead.org" <dwmw2@...radead.org>,
	"linux-mtd@...ts.infradead.org" <linux-mtd@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"jeff.westfahl@...com" <jeff.westfahl@...com>,
	"jaeden.amero@...com" <jaeden.amero@...com>,
	"joshc@...com" <joshc@...com>, Ben Shelton <ben.shelton@...com>,
	Richard Weinberger <richard@....at>,
	"Peter Pan 潘栋 (peterpandong)" 
	<peterpandong@...ron.com>, nathan.sullivan@...com
Subject: Re: [RESEND RESEND RESEND PATCH v2] mtd: nand_bbt: scan for next
 free bbt block if writing bbt fails

On 8/26/2015 7:07 PM, Brian Norris wrote:
> On Wed, Aug 26, 2015 at 03:57:00PM +0000, Bean Huo 霍斌斌 (beanhuo) wrote:
>>> On Tue, Aug 25, 2015 at 12:49:26PM -0500, Xander Huff wrote:
>
>>>> diff --git a/drivers/mtd/nand/nand_bbt.c b/drivers/mtd/nand/nand_bbt.c
>>>> index 63a1a36..09f9e62 100644
>>>> --- a/drivers/mtd/nand/nand_bbt.c
>>>> +++ b/drivers/mtd/nand/nand_bbt.c
>
>>>> -787,13 +788,42 @@ static int write_bbt(struct mtd_info *mtd, uint8_t *buf,
>>>>   		einfo.addr = to;
>>>>   		einfo.len = 1 << this->bbt_erase_shift;
>>>>   		res = nand_erase_nand(mtd, &einfo, 1);
>>>> -		if (res < 0)
>>>> +		if (res == -EIO && einfo.state == MTD_ERASE_FAILED
>>>> +		    && einfo.priv == NAND_ERASE_BLOCK_ERASE_FAILED) {
>>>
>>> Do you actually need that last condition? What's wrong with the first two?
>>>

The intent of the extra condition is to distinguish from other erase failures 
due to write protection or an already known bad block. We don't want to mark a 
write protected block as bad simply because we failed to erase it, for example.

>>>> +			/* This block is bad. Mark it as such and see if
>>>> +			 * there's another block available in the BBT area. */
>>>> +			int block = page >>
>>>> +				(this->bbt_erase_shift - this->page_shift);
>>>> +			pr_info("nand_bbt: failed to erase block %d when writing
>>> BBT\n",
>>>> +				block);
>>>> +			bbt_mark_entry(this, block, BBT_BLOCK_WORN);
>>>> +
>>>> +			res = this->block_markbad(mtd, block);
>>>> +			if (res)
>>>> +				pr_warn("nand_bbt: error %d while marking block %d
>>> bad\n",
>>>> +					res, block);
>>>> +			goto next;
>>>> +		} else if (res < 0)
>>>>   			goto outerr;
>>
>>
>> For my knowledge , we don't directly mark this block be a bad block,
>> Just like ubi layer,this block also need to further testing and verify if
>> It is real bad block.right?
>
> That's a good point...we might want some kind of separate function for a
> torture test. Might look at UBI's torture_peb() for inspiration.
>

Hmm, I'll look into this. Any performance concerns if we're torturing for every 
potential bad block we come across?

>>>>
>>>>   		res = scan_write_bbt(mtd, to, len, buf,
>>>>   				td->options & NAND_BBT_NO_OOB ? NULL :
>>>>   				&buf[len]);
>>>> -		if (res < 0)
>>>> +		if (res == -EIO) {
>>>> +			/* This block is bad. Mark it as such and see if
>>>> +			 * there's another block available in the BBT area. */
>>>> +			int block = page >>
>>>> +				(this->bbt_erase_shift - this->page_shift);
>>>> +			pr_info("nand_bbt: failed to erase block %d when writing
>>> BBT\n",
>>>> +				block);
>>>> +			bbt_mark_entry(this, block, BBT_BLOCK_WORN);
>>>> +
>>>> +			res = this->block_markbad(mtd, block);
>>>> +			if (res)
>>>> +				pr_warn("nand_bbt: error %d while marking block %d
>>> bad\n",
>>>> +					res, block);
>>>> +			goto next;
>>>> +		} else if (res < 0)
>>>>   			goto outerr;
>>>>
>>>>   		pr_info("Bad block table written to 0x%012llx, version 0x%02X\n",>
>>>> diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h index
>>>> 272f429..86e11f6 100644
>>>> --- a/include/linux/mtd/nand.h
>>>> +++ b/include/linux/mtd/nand.h
>>>> @@ -1030,4 +1030,11 @@ struct nand_sdr_timings {
>>>>
>>>>  /* get timing characteristics from ONFI timing mode. */  const struct
>>>> nand_sdr_timings *onfi_async_timing_mode_to_sdr_timings(int mode);
>>>> +
>>>> +/* reasons for erase failures */
>>>> +#define NAND_ERASE_OK			0
>>>> +#define NAND_ERASE_WRITE_PROTECTED	1
>>>> +#define NAND_ERASE_BAD_BLOCK		2
>>>> +#define NAND_ERASE_BLOCK_ERASE_FAILED	3
>>>
>>> Why exactly do you need these statuses? I thought the existing error codes
>>> were sufficient..

This goes along with why we were originally using the 'priv' field. Lots of 
places use MTD_ERASE_FAILED to see if an erase failed, but we want to check 
specifically what type of erase failure occurred. Do ya'll have any suggestions 
on how better to accomplish this? I'm thinking maybe adding a new member to the 
erase_info struct like 'fail_info' to get this information to nand_bbt.


-- 
Xander Huff
Staff Software Engineer
National Instruments
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/