linux-kernel - Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4917EF03.8060308@nokia.com>
Date:	Mon, 10 Nov 2008 10:21:23 +0200
From:	Adrian Hunter <ext-adrian.hunter@...ia.com>
To:	Pierre Ossman <drzeus@...eus.cx>
CC:	LKML <linux-kernel@...r.kernel.org>,
	"Lavinen Jarkko (Nokia-M/Helsinki)" <jarkko.lavinen@...ia.com>,
	"Bityutskiy Artem (Nokia-M/Helsinki)" <Artem.Bityutskiy@...ia.com>
Subject: Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors
 are read

Adrian Hunter wrote:
> Pierre Ossman wrote:
>> On Thu, 16 Oct 2008 16:26:57 +0300
>> Adrian Hunter <ext-adrian.hunter@...ia.com> wrote:
>>
>>> If a card encounters an ECC error while reading a sector it will
>>> timeout.  Instead of reporting the entire I/O request as having
>>> an error, redo the I/O one sector at a time so that all readable
>>> sectors are provided to the upper layers.
>>>
>>> Signed-off-by: Adrian Hunter <ext-adrian.hunter@...ia.com>
>>> ---
>>
>> We actually had something like this on the table some time ago. It got
>> scrapped because of data integrity problems. This is just for reads
>> though, so I guess it should be safe.
>>
>>> @@ -278,6 +279,9 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, 
>>> struct request *req)
>>>          brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
>>>          brq.data.blocks = req->nr_sectors;
>>>  
>>> +        if (disable_multi && brq.data.blocks > 1)
>>> +            brq.data.blocks = 1;
>>> +
>>
>> A comment here would be nice.
> 
> Ok
> 
>> You also need to adjust the sg list when you change the block count.
>> There was code there that did that previously, but it got removed in
>> 2.6.27-rc1.
> 
> That is not necessary.  It is an optimisation.  In general, optimising an
> error path serves no purpose.
> 
>>> @@ -312,6 +318,13 @@ static int mmc_blk_issue_rq(struct mmc_queue 
>>> *mq, struct request *req)
>>>  
>>>          mmc_queue_bounce_post(mq);
>>>  
>>> +        if (multi && rq_data_dir(req) == READ &&
>>> +            brq.data.error == -ETIMEDOUT) {
>>> +            /* Redo read one sector at a time */
>>> +            disable_multi = 1;
>>> +            continue;
>>> +        }
>>> +
>>
>> Some concerns here:
>>
>> 1. "brq.data.blocks > 1" doesn't need to be optimised into its own
>> variable. It just obscures things.
> 
> But you have to assume that no driver changes the 'blocks' variable e.g.
> counts it down.  It is not an optimisation, it is just to improve
> reliability and readability.  What does it obscure?
> 
>> 2. A comment here as well. Explain what this does and why it is safe
>> (so people don't try to extend it to writes)
> 
> ok
> 
>> 3. You should check all errors, not just data.error and ETIMEDOUT.
> 
> No.  Data timeout is a special case.  The other errors are system errors.
> If there is a command error or stop error (which is also a command error)
> it means either there is a bug in the kernel or the controller or card
> has failed to follow the specification.  Under those circumstances
> 
> Data timeout on the other hand just means the data could not be retrieved
> - in the case we have seen because of ECC error.
> 
>> 4. You should first report the successfully transferred blocks as ok.
> 
> That is another optimisation of the error path i.e. not necessary.  It
> is simpler to just start processing the request again - which the patch
> does.
> 
>>> @@ -360,14 +373,21 @@ static int mmc_blk_issue_rq(struct mmc_queue 
>>> *mq, struct request *req)
>>>  #endif
>>>          }
>>>  
>>> -        if (brq.cmd.error || brq.data.error || brq.stop.error)
>>> +        if (brq.cmd.error || brq.stop.error)
>>>              goto cmd_err;
>>
>> Move your code to inside this if clause and you'll solve 3. and 4. in a
>> neat manner.
> 
> Well, I do not agree with 3 and 4.
> 
>> You might also want to print something so that it is
>> visible that the driver retried the transfer.
> 
> There are already two error messages per sector (one from this function
> and one from '__blk_end_request()', so another message is too much.
> 
>>>  
>>> -        /*
>>> -         * A block was successfully transferred.
>>> -         */
>>> +        if (brq.data.error) {
>>> +            if (brq.data.error == -ETIMEDOUT &&
>>> +                rq_data_dir(req) == READ) {
>>> +                err = -EIO;
>>> +                brq.data.bytes_xfered = brq.data.blksz;
>>> +            } else
>>> +                goto cmd_err;
>>> +        } else
>>> +            err = 0;
>>> +
>>>          spin_lock_irq(&md->lock);
>>> -        ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
>>> +        ret = __blk_end_request(req, err, brq.data.bytes_xfered);
>>>          spin_unlock_irq(&md->lock);
>>>      } while (ret);
>>>  
>>
>> Instead of this big song and dance routine, just have a dedicated piece
>> of code for calling __blk_end_request() for the single sector failure.
> 
> Ok
> 
> Amended patch follows:

What is the status of this patch?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/