[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190508190000.GA156909@google.com>
Date: Wed, 8 May 2019 13:00:00 -0600
From: Raul Rangel <rrangel@...omium.org>
To: linux-mmc@...r.kernel.org
Cc: djkurtz@...omium.org, hongjiefang <hongjiefang@...micro.com>,
Jennifer Dahm <jennifer.dahm@...com>,
linux-kernel@...r.kernel.org, Shawn Lin <shawn.lin@...k-chips.com>,
Kyle Roeschley <kyle.roeschley@...com>,
Avri Altman <avri.altman@....com>,
Ulf Hansson <ulf.hansson@...aro.org>, rrangel@...omium.org
Subject: Re: [RFC PATCH 1/2] mmc: sdhci: Manually check card status after
reset
On Fri, May 03, 2019 at 09:12:24AM -0600, Raul Rangel wrote:
> On Wed, May 01, 2019 at 11:54:56AM -0600, Raul E Rangel wrote:
> > I am running into a kernel panic. A task gets stuck for more than 120
> > seconds. I keep seeing blkdev_close in the stack trace, so maybe I'm not
> > calling something correctly?
> >
> > Here is the panic: https://privatebin.net/?8ec48c1547d19975#dq/h189w5jmTlbMKKAwZjUr4bhm7Q2AgvGdRqc5BxAc=
> >
> > I sometimes see the following:
> > [ 547.943974] udevd[144]: seq 2350 '/devices/pci0000:00/0000:00:14.7/mmc_host/mmc0/mmc0:0001/block/mmcblk0/mmcblk0p1' is taking a long time
> >
> > I was getting the kernel panic on a 4.14 kernel: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f3dc032faf4d074f20ada437e2d081a28ac699da/drivers/mmc/host
> > So I'm guessing I'm missing an upstream fix.
> >
>
> I'll keep trying to track down the hung task I was seeing on 4.14. But I
> don't think that's related to these patches. I might just end up
> backporting the blk-mq patches to our 4.14 branch since I suspect that
> fixes it.
So I tracked down the hung task in 4.14, it's a resource leak.
mmc_cleanup_queue stops the worker thread. If there were any requests in
the queue they would be holding onto a reference of mmc_blk_data. When
mmc_blk_remove_req calls mmc_blk_put, there are still references to md, so
it never calls blk_cleanup_queue, and the requests stay in the queue
forever.
Fortunately Adrian already has a fix for this: https://lore.kernel.org/patchwork/patch/856512/
I think we should cherry-pick 41e3efd07d5a02c80f503e29d755aa1bbb4245de
into v4.14. I've tried it locally and it fixes the kernel panic I was
seeing.
I've also sent out two more patches for v4.14 that need to be applied
with Adrian's patch:
* https://patchwork.kernel.org/patch/10936439/
* https://patchwork.kernel.org/patch/10936441/
As for this patch, are there any comments? I have a test running that is
doing random connect/disconnects, and it's over 6k iterations now.
Thanks,
Raul
Powered by blists - more mailing lists