[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9dd7975072cf16dd6ea1947bd4ae830a@agner.ch>
Date: Sat, 01 Aug 2015 01:35:52 +0200
From: Stefan Agner <stefan@...er.ch>
To: Brian Norris <computersforpeace@...il.com>
Cc: dwmw2@...radead.org, sebastian@...akpoint.cc, robh+dt@...nel.org,
pawel.moll@....com, mark.rutland@....com,
ijc+devicetree@...lion.org.uk, galak@...eaurora.org,
shawn.guo@...aro.org, kernel@...gutronix.de,
boris.brezillon@...e-electrons.com, marb@...at.de,
aaron@...tycactus.com, bpringlemeir@...il.com,
linux-mtd@...ts.infradead.org, devicetree@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
albert.aribaud@...ev.fr, Bill Pringlemeir <bpringlemeir@...ps.com>
Subject: Re: [PATCH v9 2/5] mtd: nand: vf610_nfc: add hardware BCH-ECC support
Hi Brian,
On 2015-08-01 01:09, Brian Norris wrote:
<snip>
>> +static inline int vf610_nfc_correct_data(struct mtd_info *mtd, uint8_t *dat)
>> +{
>> + struct vf610_nfc *nfc = mtd_to_nfc(mtd);
>> + u8 ecc_status;
>> + u8 ecc_count;
>> + int flip;
>> +
>> + ecc_status = __raw_readb(nfc->regs + ECC_SRAM_ADDR * 8 + ECC_OFFSET);
>> + ecc_count = ecc_status & ECC_ERR_COUNT;
>> + if (!(ecc_status & ECC_STATUS_MASK))
>> + return ecc_count;
>> +
>> + /*
>> + * On an erased page, bit count should be zero or at least
>> + * less then half of the ECC strength
>> + */
>> + flip = count_written_bits(dat, nfc->chip.ecc.size, ecc_count);
>
> Sorry I didn't notice this earlier, but it appears you are falling into
> the same trap that almost everyone else is -- it is not sufficient to
> check just the page area; you also need to check the OOB. Suppose that
> a MTD user wrote mostly-0xff data to the page, then the page accumulates
> bitflips in the spare area and a few in the page area, such that
> eventually HW ECC can't correct them. If there are few enough zero bits
> in the data area, you will mistakenly think that this is a blank page
> below, and memset() it to 0xff. That would be disastrous!
>
> Fortunately, your code is otherwise quite well structured and looks
> good. A tip below.
>
>> +
>> + if (flip > ecc_count && flip > (nfc->chip.ecc.strength / 2))
>> + return -1;
>> +
>> + /* Erased page. */
>> + memset(dat, 0xff, nfc->chip.ecc.size);
>> + return 0;
>> +}
>> +
>> +static int vf610_nfc_read_page(struct mtd_info *mtd, struct nand_chip *chip,
>> + uint8_t *buf, int oob_required, int page)
>> +{
>> + int eccsize = chip->ecc.size;
>> + int stat;
>> +
>> + vf610_nfc_read_buf(mtd, buf, eccsize);
>> +
>> + if (oob_required)
>> + vf610_nfc_read_buf(mtd, chip->oob_poi, mtd->oobsize);
>
> To fix the bitflips issue above, you'll just want to unconditionally
> read the OOB (it's fine to ignore 'oob_required') and...
>
>> +
>> + stat = vf610_nfc_correct_data(mtd, buf);
>
> ...pass in chip->oob_poi as a third argument.
>
Hm, this probably will have an effect on performance, since we usually
omit the OOB if not requested. I could fetch the OOB from the NAND
controllers SRAM only if necessary (if HW ECC status is not ok...). Does
this sound reasonable?
>> +
>> + if (stat < 0)
>> + mtd->ecc_stats.failed++;
>> + else
>> + mtd->ecc_stats.corrected += stat;
>
> You've got another problem here: ecc.read_page() should be returning
> 'max_bitflips' here. So, since you have a single ECC region, this block
> should probably be:
>
> if (stat < 0) {
> mtd->ecc_stats.failed++;
> return 0;
> } else {
> mtd->ecc_stats.corrected += stat;
> return stat;
> }
>
Ok, will change that.
--
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists