[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YzXKXQOw5FtNmy/J@uudg.org>
Date: Thu, 29 Sep 2022 13:39:57 -0300
From: "Luis Claudio R. Goncalves" <lgoncalv@...hat.com>
To: Ulf Hansson <ulf.hansson@...aro.org>
Cc: "dinggao.pan" <dinggao.pan@...izon.ai>,
"bigeasy@...utronix.de" <bigeasy@...utronix.de>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"ming.yu" <ming.yu@...izon.ai>,
"yunqian.wang" <yunqian.wang@...izon.ai>,
"linux-mmc@...r.kernel.org" <linux-mmc@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-rt-users@...r.kernel.org" <linux-rt-users@...r.kernel.org>
Subject: Re: [PATCH RFC stable 4.14 1/1] mmc: core: fix hung task caused by
race condition on context_info
On Thu, Sep 29, 2022 at 02:41:26PM +0200, Ulf Hansson wrote:
> On Mon, 5 Sept 2022 at 08:22, dinggao.pan <dinggao.pan@...izon.ai> wrote:
> >
> > Hi,
> > After applying rt patches to our 4.14 kernel and enabling preempt-rt, we met a hung task during boot caused by race condition on context_info stored in struct mmc_host.
> > From our investigation, context_info should not be changed by threads that have not claimed the host, hence the following fix.
> >
> > Any comments are much appreciated.
> > Dinggao Pan
>
> Hi Dinggao,
>
> Apologize for the delay.
>
> The 4.14 kernel is too old for me to be able to comment. In
> particular, the mmc block layer moved to blk-mq in v4.16, which means
> the path you are investigating doesn't exist any more, sorry.
And the new code has the queue operations protected by a spinlock
(queue_lock), which I believe is necessary to fix the issue reported
here.
Luis
> Kind regards
> Uffe
>
> >
> > From: "Dinggao Pan" <mailto:dinggao.pan@...izon.ai>
> >
> > A race condition happens under following circumstances:
> > (mmc_thread1) | (mmc_thread2)
> > mmc_issue_rq(req1) |
> > > qcnt++ for req1 |
> > host handling req1 |
> > mmc_queue_thread(req=null) |
> > > enter queue thread |
> > again, fetches blk req |
> > (return null), sets |
> > is_waiting_last_req 1 | mmc_request_fn(req1) -> set is_new_req 1
> > | and wake_up wait_queue
> > mmc_issue_rq(req2) | > mmc_thread2 tries to claim host
> > > **qcnt++ for req2** |
> > mmc_finalize_req(req2) |
> > > should wait for req1 |
> > done but req2 return |
> > MMC_BLK_NEW_REQ |
> > due to is_new_req |
> > already set to 1 |
> > |
> > |
> > req1 done |
> > > qcnt-- for req1 |
> > mmc_issue_rq(req3) |
> > > qcnt++ for req3 |
> > req2 is not handled but qcnt is already added(noted by **),
> > thus mmc_thread1 will never release host, causing mmc_threads
> > except thread1 to hung. Fix race by moving wake_up to the front of
> > context_info update.
> >
> > Reviewed By: Yunqian Wang <mailto:yunqian.wang@...izon.ai>
> > Signed-off-by: Dinggao Pan <mailto:dinggao.pan@...izon.ai>
> > Signed-off-by: Ming Yu <mailto:ming.yu@...izon.ai>
> > ---
> > drivers/mmc/core/queue.c | 7 +++++--
> > 1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> > index 0a4e77a5b..58318c102 100644
> > --- a/drivers/mmc/core/queue.c
> > +++ b/drivers/mmc/core/queue.c
> > @@ -107,6 +107,11 @@ static void mmc_request_fn(struct request_queue *q)
> > return;
> > }
> >
> > + if (mq->asleep) {
> > + wake_up_process(mq->thread);
> > + return;
> > + }
> > +
> > cntx = &mq->card->host->context_info;
> >
> > if (cntx->is_waiting_last_req) {
> > @@ -114,8 +119,6 @@ static void mmc_request_fn(struct request_queue *q)
> > wake_up_interruptible(&cntx->wait);
> > }
> >
> > - if (mq->asleep)
> > - wake_up_process(mq->thread);
> > }
> >
> > static struct scatterlist *mmc_alloc_sg(int sg_len, gfp_t gfp)
> > --
> > 2.36.1
>
---end quoted text---
Powered by blists - more mailing lists