[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161202085545.16042bcb@bbrezillon>
Date: Fri, 2 Dec 2016 08:55:45 +0100
From: Boris Brezillon <boris.brezillon@...e-electrons.com>
To: Masahiro Yamada <yamada.masahiro@...ionext.com>
Cc: Richard Weinberger <richard@....at>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Marek Vasut <marek.vasut@...il.com>,
linux-mtd@...ts.infradead.org,
Cyrille Pitchen <cyrille.pitchen@...el.com>,
Brian Norris <computersforpeace@...il.com>,
David Woodhouse <dwmw2@...radead.org>
Subject: Re: [PATCH 15/39] mtd: nand: denali: improve readability of
handle_ecc()
On Fri, 2 Dec 2016 13:26:27 +0900
Masahiro Yamada <yamada.masahiro@...ionext.com> wrote:
> Hi Boris,
>
>
> 2016-11-28 0:42 GMT+09:00 Boris Brezillon <boris.brezillon@...e-electrons.com>:
> >> + if (err_byte < ECC_SECTOR_SIZE) {
> >> + struct mtd_info *mtd =
> >> + nand_to_mtd(&denali->nand);
> >> + int offset;
> >> +
> >> + offset = (err_sector * ECC_SECTOR_SIZE + err_byte) *
> >> + denali->devnum + err_device;
> >> + /* correct the ECC error */
> >> + buf[offset] ^= err_correction_value;
> >> + mtd->ecc_stats.corrected++;
> >> + bitflips++;
> >
> > Hm, bitflips is what is set in max_bitflips, and apparently the
> > implementation (which is not yours) is not doing what the core expects.
> >
> > You should first count bitflips per sector with something like that:
> >
> > bitflips[err_sector]++;
> >
> >
> > And then once you've iterated over all errors do:
> >
> > for (i = 0; i < nsectors; i++)
> > max_bitflips = max(bitflips[err_sector], max_bitflips);
>
>
> I see.
>
> For soft ECC fixup, we can calculate bitflips
> for each ECC sector, so I can fix the max_bitflips
> as the core framework expects.
>
> For hard ECC fixup, the register only reports
> the number of corrected bit-flips
> in the whole page (sum from all ECC sectors).
> We cannot calculate max_bitflips, I think.
>
That's unfortunate. This means you'll return -EUCLEAN more quickly
(which will trigger UBI eraseblock move), since the NAND framework is
basing its 'too many bitflips' detection logic on the max_bitflips per
ECC chunk and the bitflips threshold (by default 3/4 of the ECC
strength).
That doesn't mean it won't work, you'll just wear your NAND more
quickly :-(.
ITOH, doing max_bitflips = nbitflips / nsteps is not good either,
because the bitflips might be all concentrated in the same ECC chunk,
and in this case you really want to return -EUCLEAN.
>
>
> BTW, I noticed another problem of the current code.
>
> buf[offset] ^= err_correction_value;
> mtd->ecc_stats.corrected++;
> bitflips++;
>
> This code is counting the number of corrected bytes,
> not the number of corrected bits.
>
>
> I think multiple bit-flips within one byte can happen.
Yes.
>
>
> Perhaps, we should add
>
> hweight8(buf[offset] ^ err_correction_value)
>
> to ecc_stats.corrected and bitflips.
>
Looks good.
Powered by blists - more mailing lists