[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6a1b50f4-320f-43d1-50e3-b0a2c3c7fb96@allwinnertech.com>
Date: Fri, 7 Feb 2020 12:13:08 +0800
From: liaoweixiong <liaoweixiong@...winnertech.com>
To: Miquel Raynal <mraynal@...nel.org>
Cc: Kees Cook <keescook@...omium.org>,
Anton Vorontsov <anton@...msg.org>,
Colin Cross <ccross@...roid.com>,
Tony Luck <tony.luck@...el.com>,
Jonathan Corbet <corbet@....net>,
Richard Weinberger <richard@....at>,
Vignesh Raghavendra <vigneshr@...com>,
Mauro Carvalho Chehab <mchehab+samsung@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Rob Herring <robh@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Jonathan Cameron <Jonathan.Cameron@...wei.com>,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mtd@...ts.infradead.org
Subject: Re: [PATCH v1 11/11] mtd: new support oops logger based on pstore/blk
hi Miquel Raynal,
On 2020/2/6 PM 11:45, Miquel Raynal wrote:
> Hi liao,
>
> liaoweixiong <liaoweixiong@...winnertech.com> wrote on Thu, 6 Feb 2020
> 21:10:47 +0800:
>
>> hi Miquel Raynal,
>>
>> On 2020/1/23 AM 1:41, Miquel Raynal wrote:
>>> Hello,
>>>
>>>
>>>>>>>> +/*
>>>>>>>> + * All zones will be read as pstore/blk will read zone one by one when do
>>>>>>>> + * recover.
>>>>>>>> + */
>>>>>>>> +static ssize_t mtdpstore_read(char *buf, size_t size, loff_t off)
>>>>>>>> +{
>>>>>>>> + struct mtdpstore_context *cxt = &oops_cxt;
>>>>>>>> + size_t retlen;
>>>>>>>> + int ret;
>>>>>>>> +
>>>>>>>> + if (mtdpstore_block_isbad(cxt, off))
>>>>>>>> + return -ENEXT;
>>>>>>>> +
>>>>>>>> + pr_debug("try to read off 0x%llx size %zu\n", off, size);
>>>>>>>> + ret = mtd_read(cxt->mtd, off, size, &retlen, (u_char *)buf);
>>>>>>>> + if ((ret < 0 && !mtd_is_bitflip(ret)) || size != retlen) {
>>>>>>>
>>>>>>> IIRC size != retlen does not mean it failed, but that you should
>>>>>>> continue reading after retlen bytes, no?
>>>>>>> >>
>>>>>> Yes, you are right. I will fix it. Thanks.
>>>>>> >>>>> Also, mtd_is_bitflip() does not mean that you are reading a false
>>>>>>> buffer, but that the data has been corrected as it contained bitflips.
>>>>>>> mtd_is_eccerr() however, would be meaningful.
>>>>>>> >>
>>>>>> Sure I know mtd_is_bitflip() does not mean failure, but I do not think
>>>>>> mtd_is_eccerr() should be here since the codes are ret < 0 and NOT
>>>>>> mtd_is_bitflip().
>>>>>
>>>>> Yes, just drop this check, only keep ret < 0.
>>>>> >>
>>>> If I don't get it wrong, it should not be dropped here. Like your words,
>>>> "mtd_is_bitflip() does not mean that you are reading a false buffer,
>>>> but that the data has been corrected as it contained bitflips.", the
>>>> data I get are valid even if mtd_is_bitflip() return true. It's correct
>>>> data and it's no need to go to handle error. To me, the codes
>>>> should be:
>>>> if (ret < 0 && !mit_is_bitflip())
>>>> [error handling]
>>>
>>> Please check the implementation of mtd_is_bitflip(). You'll probably
>>> figure out what I am saying.
>>>
>>> https://elixir.bootlin.com/linux/latest/source/include/linux/mtd/mtd.h#L585
>>>
>>
>> How about the codes as follows:
>>
>> for (done = 0, retlen = 0; done < size; done += retlen) {
>> ret = mtd_read(..., &retlen, ...);
>> if (!ret)
>> continue;
>> /*
>> * do nothing if bitflip and ecc error occurs because whether
>> * it's bitflip or ECC error, just a small number of bits flip
>> * and the impact on log data is so small. The mtdpstore just
>> * hands over what it gets and user can judge whether the data
>> * is valid or not.
>> */
>> if (mtd_is_bitflip(ret)) {
>> dev_warn("bitflip at....");
>> continue;
> I don't understand why do you check for bitflips. Bitflips have been
> corrected at this stage, you just get the information that there
> has been bitflips, but the data integrity is fine.
>
Both of bitflip and eccerror are not real wrong in this
case. So we must check them.
> I am not against ignoring ECC errors in this case though. I would
> propose:
>
> for (...) {
> if (ret < 0) {
> complain;
> return;
> }
>
-117 (-EUCLEAN) means bitflip but be corrected.
-74 (-EBADMSG) means ecc error that uncorrectable
All of them are negative number that smaller than 0. If it just keeps
"ret < 0", it can never make a difference between bitflip/eccerror
and others.
> if (mtd_is_eccerr())
> complain;
> }
>
>> } else if (mtd_is_eccerr(ret)) {
>> dev_warn("eccerr at....");
>> retlen = retlen == 0 ? size : retlen;
>> continue;
>> } else {
>> dev_err("read failure at...");
>> /* this zone is broken, try next one */
>> return -ENEXT;
>> }
>> }
>>
>
>
> Thanks,
> Miquèl
>
Powered by blists - more mailing lists