linux-kernel - RE: [PATCH] RFC: mmc: dw_mmc: Always go to STATE_DATA_BUSY from STATE_DATA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <001601ce35b9$652c5870$2f850950$%jun@samsung.com>
Date:	Wed, 10 Apr 2013 16:02:41 +0900
From:	Seungwon Jeon <tgih.jun@...sung.com>
To:	'Doug Anderson' <dianders@...omium.org>
Cc:	'Jaehoon Chung' <jh80.chung@...sung.com>,
	'Chris Ball' <cjb@...top.org>,
	'Will Newton' <will.newton@...il.com>,
	'Bing Zhao' <bzhao@...vell.com>,
	'Ashok Nagarajan' <asnagarajan@...omium.org>,
	'Paul Stewart' <pstew@...omium.org>,
	'Olof Johansson' <olof@...om.net>, linux-mmc@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: RE: [PATCH] RFC: mmc: dw_mmc: Always go to STATE_DATA_BUSY from
 STATE_DATA_ERROR

On Tuesday, April 09, 2013, Doug Anderson wrote:
> Seungwon,
> 
> On Mon, Apr 8, 2013 at 5:17 AM, Seungwon Jeon <tgih.jun@...sung.com> wrote:
> > I guess Doug are debugging it with wifi, right?
> 
> Yes, we're debugging it on the Samsung ARM Chromebook on a part that
> has an SDIO WiFi module by Marvell.  Bing Zhao (CCed) has a unit in
> hand that generates lots of CRC errors and has been testing patches
> I've sent him.
> 
> 
> > The problem happens when dw_mci_stop_dma is called in the middle of data transfers.
> > If data error occurs in the end of block, EVENT_XFER_COMPLETE might be set. So, it's fine.
> > Actually, dw_mci_idmac_stop_dma stops the dma working, there is no further interrupt for dma
> completion.
> 
> That sounds right to me.
> 
> 
> > There are two solutions we have applied.
> 
> I'm a little confused.  Have you already applied one or both of the
> solutions you list below, or are you proposing them as alternates to
> the patch I submitted?
Yes, first one already has been applied.
I wanted to introduce our fix. Did you try to test with these fixes?

> 
> > #1. deferring the call of dw_mci_stop_dma until EVENT_XFER_COMPLETE flag is set into pending_events.
> > In this case, dma transfer will be continued with error.
> >
> > @@ -1062,7 +1062,6 @@ static void dw_mci_tasklet_func(unsigned long priv)
> >                 case STATE_SENDING_DATA:
> >                         if (test_and_clear_bit(EVENT_DATA_ERROR,
> >                                                &host->pending_events)) {
> > -                               dw_mci_stop_dma(host);
> >                                 if (data->stop)
> >                                         send_stop_cmd(host, data);
> >                                 state = STATE_DATA_ERROR;
> > @@ -1155,6 +1154,9 @@ static void dw_mci_tasklet_func(unsigned long priv)
> >                                                 &host->pending_events))
> >                                 break;
> >
> > +                       dw_mci_stop_dma(host);
> > +                       set_bit(EVENT_XFER_COMPLETE, &host->completed_events);
> > +
> >                         state = STATE_DATA_BUSY;
> >                         break;
> 
> I can't say that I'm quite familiar enough with the intricate details
> of the driver to know whether this is a good idea or guaranteed to
> work.  Do we really think that we'll still get the end of the transfer
> properly if we've seen an error already?  I worry that we won't.
For example, let's pretend data CRC error occurs during data read.
Peer device doesn't know that error occurrence and data transmission still keeps going.
dma will run as long as host doesn't take the stop or see the end of descriptor.
> 
> 
> > #2. set EVENT_XFER_COMPLETE flag when dw_mci_stop_dma is called regardless using_dma.
> >
> > @@ -299,10 +299,9 @@ static void dw_mci_stop_dma(struct dw_mci *host)
> >         if (host->using_dma) {
> >                 host->dma_ops->stop(host);
> >                 host->dma_ops->cleanup(host);
> > -       } else {
> > -               /* Data transfer was stopped by the interrupt handler */
> > -               set_bit(EVENT_XFER_COMPLETE, &host->pending_events);
> >         }
> > +
> > +       set_bit(EVENT_XFER_COMPLETE, &host->pending_events);
> >  }
> 
> This is fairly similar to my patch but goes further.  I believe my
> patch has this effect but only for the call to dw_mci_stop_dma() in
> STATE_SENDING_DATA in the tasklet.  Your affects all 3 calls to
> dw_mci_stop_dma().
> 
> This seems reasonable but I don't have confidence in my understanding
> of this driver's state machine (especially with regards to the error
> conditions) that I can say which is better.  If you think that this is
> a more correct solution than mine then we can give it some testing.
Yes. As a result, both patches prevent tasklet's hanging.
In that regard, two patches give the similar effect.
But I think your fix are just removing the test_bit to wait for EVENT_XFER_COMPLETE.
'clear_bit(...) part which is added might be of no effect.
It doesn't make sense a bit.

<quotation>
 		case STATE_DATA_ERROR:
-			if (!test_and_clear_bit(EVENT_XFER_COMPLETE,
-						&host->pending_events))
-				break;
-
+			clear_bit(EVENT_XFER_COMPLETE, &host->pending_events);
</quotation>

Thanks,
Seugwon Jeon
> 
> Thanks!
> 
> -Doug
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/