linux-kernel - Re: [PATCH] mmc_spi: do propper retry managment in the block layer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090518152001.GA29472@console-pimps.org>
Date:	Mon, 18 May 2009 16:20:01 +0100
From:	Matt Fleming <matt@...sole-pimps.org>
To:	Wolfgang Mües <wolfgang.mues@...rswald.de>
Cc:	Pierre Ossman <drzeus@...eus.cx>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Brownell <dbrownell@...rs.sourceforge.net>,
	Mike Frysinger <vapier.adi@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mmc_spi: do propper retry managment in the block layer
	- rewritten

On Mon, May 18, 2009 at 02:21:38PM +0100, Wolfgang Mües wrote:
> From: Wolfgang Muees <wolfgang.mues@...rswald.de>
> 
> o This patch adds a propper retry managment for reading
>   and writing data blocks for mmc and mmc_spi. Blocks are
>   retransmitted if the cause of failure might be a transmission
>   fault. After a permanent failure, we try to read all other
>   pending sectors, but terminate on write.
>   This patch was tested with induced transmission errors
>   by ESD pulses (and survived).
> 
> Signed-off-by: Wolfgang Muees <wolfgang.mues@...rswald.de>
> 

When you say "proper retry management", what problem are you having
with the current code? Your changelog tells me what you changed but
not why you changed it.

> ---
> diff -uprN 2_6_29_rc7_patch_EILSEQ/drivers/mmc/card/block.c 2_6_29_rc7_patch_retries/drivers/mmc/card/block.c
> --- 2_6_29_rc7_patch_EILSEQ/drivers/mmc/card/block.c	2009-03-09 17:10:55.000000000 +0100
> +++ 2_6_29_rc7_patch_retries/drivers/mmc/card/block.c	2009-05-18 15:00:41.000000000 +0200

It's always a good idea to make your changes against the latest kernel
version to make sure that your patches apply cleanly. They applied OK
this time, but they might not in future.

> @@ -230,12 +230,14 @@ static int mmc_blk_issue_rq(struct mmc_q
>  	struct mmc_card *card = md->queue.card;
>  	struct mmc_blk_request brq;
>  	int ret = 1, disable_multi = 0;
> +	int retries = 3;
> 
>  	mmc_claim_host(card->host);
> 
>  	do {
>  		struct mmc_command cmd;
>  		u32 readcmd, writecmd, status = 0;
> +		int error = 0;
> 
>  		memset(&brq, 0, sizeof(struct mmc_blk_request));
>  		brq.mrq.cmd = &brq.cmd;
> @@ -251,10 +253,8 @@ static int mmc_blk_issue_rq(struct mmc_q
>  		brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
>  		brq.data.blocks = req->nr_sectors;
> 
> -		/*
> -		 * After a read error, we redo the request one sector at a time
> -		 * in order to accurately determine which sectors can be read
> -		 * successfully.
> +		/* After an transmission error, we switch to single block
> +		 * transfer until the problem is solved or we fail.
>  		 */

Why did you change this comment? They seem to say roughly the same
things to me, only the first one is more descriptive. I think you
really need to combine the two,

		/*
		 * After a transmission error, we redo the request one sector
		 * at a time in order to accurately determine which sectors
		 * can be transmitted successfully. We keep retrying until
		 * the transmission succeeds or we exceed our retry count.
		 */

>  		if (disable_multi && brq.data.blocks > 1)
>  			brq.data.blocks = 1;
> @@ -262,10 +262,12 @@ static int mmc_blk_issue_rq(struct mmc_q
>  		if (brq.data.blocks > 1) {
>  			/* SPI multiblock writes terminate using a special
>  			 * token, not a STOP_TRANSMISSION request.
> +			 * Here, this request is set for ALL types of
> +			 * hosts, so that we can use it in the lower
> +			 * layers if the data transfer stage has failed
> +			 * and the card is not able to accept the token.
>  			 */
> -			if (!mmc_host_is_spi(card->host)
> -					|| rq_data_dir(req) == READ)
> -				brq.mrq.stop = &brq.stop;
> +			brq.mrq.stop = &brq.stop;
>  			readcmd = MMC_READ_MULTIPLE_BLOCK;
>  			writecmd = MMC_WRITE_MULTIPLE_BLOCK;
>  		} else {

Hmm.. is it OK to abuse the STOP_TRANSMISSION request like this? Does
SPI multiblock write still terminate with this patch?

> @@ -311,52 +313,71 @@ static int mmc_blk_issue_rq(struct mmc_q
>  		mmc_wait_for_req(card->host, &brq.mrq);
> 
>  		mmc_queue_bounce_post(mq);
> -
> -		/*
> -		 * Check for errors here, but don't jump to cmd_err
> -		 * until later as we need to wait for the card to leave
> -		 * programming mode even when things go wrong.
> -		 */
> -		if (brq.cmd.error || brq.data.error || brq.stop.error) {
> -			if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
> -				/* Redo read one sector at a time */
> -				printk(KERN_WARNING "%s: retrying using single "
> -				       "block read\n", req->rq_disk->disk_name);
> -				disable_multi = 1;
> -				continue;
> -			}
> -			status = get_card_status(card, req);
> -		}
> -
> +#if 0
> +		printk(KERN_INFO "%s transfer sector %d count %d\n",
> +			(rq_data_dir(req) == READ) ? "Read" : "Write",
> +			(int)req->sector, (int)brq.data.blocks);
> +#endif

This should probably be using pr_debug(). That way you can remove the
#if 0. Or delete it altogether.

[...]

> diff -uprN 2_6_29_rc7_patch_EILSEQ/drivers/mmc/host/mmc_spi.c 2_6_29_rc7_patch_retries/drivers/mmc/host/mmc_spi.c
> --- 2_6_29_rc7_patch_EILSEQ/drivers/mmc/host/mmc_spi.c	2009-05-14 12:49:42.000000000 +0200
> +++ 2_6_29_rc7_patch_retries/drivers/mmc/host/mmc_spi.c	2009-05-14 15:01:15.000000000 +0200
> @@ -1087,7 +1089,15 @@ static void mmc_spi_request(struct mmc_h
>  	status = mmc_spi_command_send(host, mrq, mrq->cmd, mrq->data != NULL);
>  	if (status == 0 && mrq->data) {
>  		mmc_spi_data_do(host, mrq->cmd, mrq->data, mrq->data->blksz);
> -		if (mrq->stop)
> +		/* filter-out the stop command for multiblock writes,
> +		* only if the data stage has no transmission error.
> +		* If the data stage has a transmission error, send the
> +		* STOP command because there is a great chance that the
> +		* SPI stop token was not accepted by the card.
> +		*/
> +		if (mrq->stop &&
> +		  ((mrq->data->flags & MMC_DATA_READ)
> +		 || mrq->data->error))
>  			status = mmc_spi_command_send(host, mrq, mrq->stop, 0);
>  		else
>  			mmc_cs_off(host);

Again, does this still work for SPI? Have you tested it?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/