[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49390BD7.3040305@nokia.com>
Date: Fri, 05 Dec 2008 13:09:11 +0200
From: Adrian Hunter <ext-adrian.hunter@...ia.com>
To: Pierre Ossman <drzeus@...eus.cx>
CC: LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors
are read
Pierre Ossman wrote:
> On Wed, 29 Oct 2008 16:26:07 +0200
> Adrian Hunter <ext-adrian.hunter@...ia.com> wrote:
>
>> Pierre Ossman wrote:
>>> You also need to adjust the sg list when you change the block count.
>>> There was code there that did that previously, but it got removed in
>>> 2.6.27-rc1.
>> That is not necessary. It is an optimisation. In general, optimising an
>> error path serves no purpose.
>>
>
> Actually, it is a requirement that some drivers rely on. There is also
> a WARN_ON(), or possibly a BUG_ON() in the request handling path. I
> think it only triggers with MMC_DEBUG though...
OK
>>> Some concerns here:
>>>
>>> 1. "brq.data.blocks > 1" doesn't need to be optimised into its own
>>> variable. It just obscures things.
>> But you have to assume that no driver changes the 'blocks' variable e.g.
>> counts it down.
>
> That would be a no-no. I guess this is not properly documented, but it
> is strongly implied that the request structure should be read-only
> except for fields that contain result data. ;)
>
>> It is not an optimisation, it is just to improve
>> reliability and readability. What does it obscure?
>
> if-clauses look like they're different, but they're not.
Well, the if clauses are different anyway, but no matter.
>>> 3. You should check all errors, not just data.error and ETIMEDOUT.
>> No. Data timeout is a special case. The other errors are system errors.
>> If there is a command error or stop error (which is also a command error)
>> it means either there is a bug in the kernel or the controller or card
>> has failed to follow the specification. Under those circumstances
>
> Controllers do not follow the specs. Welcome to reality. :)
>
> (sad fact: I don't think there is a single controller out there that
> follows the specs to 100%).
>
> Anyway, for this specific case, a controller might not be able to
> differentiate between command and data timeouts. Or it might report
> "data error" for anything unexpected (I seem to recall one of the
> currently supported controllers behaves like this).
>
> Generally it's about being as flexible as we can be. This is a world of
> crappy hardware, so strict spec adherence just leads to a lot of hurt.
> And I have the scars to prove it. :)
I agree with your philosophy, although I was merely letting it go back to
the way it was - which I thought was safer then introducing new behaviour
into an undefined situation.
>>> 4. You should first report the successfully transferred blocks as ok.
>> That is another optimisation of the error path i.e. not necessary. It
>> is simpler to just start processing the request again - which the patch
>> does.
>>
>
> I suppose.
>
>>> You might also want to print something so that it is
>>> visible that the driver retried the transfer.
>> There are already two error messages per sector (one from this function
>> and one from '__blk_end_request()', so another message is too much.
>>
>
> I didn't mean an error for the bad sector, but something like
> "encountered an error during multi-block transfer. retrying...". This
> could save a lot of time and headache during debugging in the future.
>
>> @@ -362,12 +380,19 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
>> #endif
>> }
>>
>> - if (brq.cmd.error || brq.data.error || brq.stop.error)
>> + if (brq.cmd.error || brq.stop.error)
>> + goto cmd_err;
>> +
>> + if (brq.data.error == -ETIMEDOUT && rq_data_dir(req) == READ) {
>> + spin_lock_irq(&md->lock);
>> + ret = __blk_end_request(req, -EIO, brq.data.blksz);
>> + spin_unlock_irq(&md->lock);
>> + continue;
>> + }
>> +
>
> This could use a comment explaining that if we've reached this point,
> we know that it was a single sector that failed.
>
>> + if (brq.cmd.error)
>> goto cmd_err;
>>
>> - /*
>> - * A block was successfully transferred.
>> - */
>> spin_lock_irq(&md->lock);
>> ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
>> spin_unlock_irq(&md->lock);
>
> Shouldn't this comment stay now? :)
OK
>From ceb711896b3c0cbc1c01b8f0c3733cedf3d88843 Mon Sep 17 00:00:00 2001
From: Adrian Hunter <ext-adrian.hunter@...ia.com>
Date: Thu, 16 Oct 2008 13:13:08 +0300
Subject: [PATCH] mmc_block: ensure all sectors that do not have errors are read
If a card encounters an ECC error while reading a sector it will
timeout. Instead of reporting the entire I/O request as having
an error, redo the I/O one sector at a time so that all readable
sectors are provided to the upper layers.
Signed-off-by: Adrian Hunter <ext-adrian.hunter@...ia.com>
---
drivers/mmc/card/block.c | 68 ++++++++++++++++++++++++++++++++++-----------
1 files changed, 51 insertions(+), 17 deletions(-)
diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index 5b46ec9..040b57e 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -231,7 +231,7 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
struct mmc_blk_data *md = mq->data;
struct mmc_card *card = md->queue.card;
struct mmc_blk_request brq;
- int ret = 1;
+ int ret = 1, disable_multi = 0;
mmc_claim_host(card->host);
@@ -253,6 +253,14 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
brq.data.blocks = req->nr_sectors;
+ /*
+ * After a read error, we redo the request one sector at a time
+ * in order to accurately determine which sectors can be read
+ * successfully.
+ */
+ if (disable_multi && brq.data.blocks > 1)
+ brq.data.blocks = 1;
+
if (brq.data.blocks > 1) {
/* SPI multiblock writes terminate using a special
* token, not a STOP_TRANSMISSION request.
@@ -281,6 +289,16 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
brq.data.sg = mq->sg;
brq.data.sg_len = mmc_queue_map_sg(mq);
+ /*
+ * Some drivers expect the sg list to be the same size as the
+ * request, which it won't be if we have fallen back to do
+ * one sector at a time.
+ */
+ if (disable_multi) {
+ brq.data.sg->length = 512;
+ brq.data.sg_len = 1;
+ }
+
mmc_queue_bounce_pre(mq);
mmc_wait_for_req(card->host, &brq.mrq);
@@ -292,8 +310,17 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
* until later as we need to wait for the card to leave
* programming mode even when things go wrong.
*/
- if (brq.cmd.error || brq.data.error || brq.stop.error)
+ if (brq.cmd.error || brq.data.error || brq.stop.error) {
+ if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
+ /* Redo read one sector at a time */
+ printk(KERN_WARNING "%s: error, retrying using "
+ "single block read\n",
+ req->rq_disk->disk_name);
+ disable_multi = 1;
+ continue;
+ }
status = get_card_status(card, req);
+ }
if (brq.cmd.error) {
printk(KERN_ERR "%s: error %d sending read/write "
@@ -350,8 +377,20 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
#endif
}
- if (brq.cmd.error || brq.data.error || brq.stop.error)
+ if (brq.cmd.error || brq.stop.error || brq.data.error) {
+ if (rq_data_dir(req) == READ) {
+ /*
+ * After an error, we redo I/O one sector at a
+ * time, so we only reach here after trying to
+ * read a single sector.
+ */
+ spin_lock_irq(&md->lock);
+ ret = __blk_end_request(req, -EIO, brq.data.blksz);
+ spin_unlock_irq(&md->lock);
+ continue;
+ }
goto cmd_err;
+ }
/*
* A block was successfully transferred.
@@ -373,25 +412,20 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
* If the card is not SD, we can still ok written sectors
* as reported by the controller (which might be less than
* the real number of written sectors, but never more).
- *
- * For reads we just fail the entire chunk as that should
- * be safe in all cases.
*/
- if (rq_data_dir(req) != READ) {
- if (mmc_card_sd(card)) {
- u32 blocks;
+ if (mmc_card_sd(card)) {
+ u32 blocks;
- blocks = mmc_sd_num_wr_blocks(card);
- if (blocks != (u32)-1) {
- spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, 0, blocks << 9);
- spin_unlock_irq(&md->lock);
- }
- } else {
+ blocks = mmc_sd_num_wr_blocks(card);
+ if (blocks != (u32)-1) {
spin_lock_irq(&md->lock);
- ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
+ ret = __blk_end_request(req, 0, blocks << 9);
spin_unlock_irq(&md->lock);
}
+ } else {
+ spin_lock_irq(&md->lock);
+ ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
+ spin_unlock_irq(&md->lock);
}
mmc_release_host(card->host);
--
1.5.4.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists