[<prev] [next>] [day] [month] [year] [list]
Message-ID: <5354DA4A.8060503@realsil.com.cn>
Date: Mon, 21 Apr 2014 16:43:54 +0800
From: micky <micky_ching@...lsil.com.cn>
To: Peter Wu <peter@...ensteyn.nl>,
<driverdev-devel@...uxdriverproject.org>
CC: <chris@...ntf.net>, <sameo@...ux.intel.com>,
<ulf.hansson@...aro.org>, <gregkh@...uxfoundation.org>,
<linux-mmc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<wei_wang@...lsil.com.cn>, <rogerable@...ltek.com>,
<devel@...uxdriverproject.org>, <dan.carpenter@...cle.com>,
Borislav Petkov <bp@...en8.de>
Subject: Re: Locking issues in "mmc: rtsx: add support for pre_req and post_req"
Hi Peter,
I'm considering not using spinlock to get it work and will resend the
patch later.
Thank you.
Best Regards.
micky
On 04/19/2014 07:13 AM, Peter Wu wrote:
> So, I reverted c42deffd5b53c9e583d83c7964854ede2f12410d ("mmc: rtsx: add
> support for pre_req and post_req") on top of v3.15-rc1-49-g10ec34f and the
> hang issue went away.
>
> There is something that is possibly problematic. All three tasklets (cmd,
> data, finish) try to spinlock on host->lock. According to the tasklets
> documentation[1], they always run at interrupts (i.e., at any time, possibly
> right after tasklet_schedule if I got this right?).
>
> These tasklets however do get scheduled *under* the host->lock which will
> cause a deadlock. This proposed patch ("mmc: rtsx: fix possible circular
> locking dependency") fixes the issue for sd_isr_done_transfer, but there are
> others:
>
> 1. sd_request_timeout:
> 125 spin_lock_irqsave(&host->lock, flags);
> ...
> 139 out:
> 140 tasklet_schedule(&host->finish_tasklet); // <--
> 141 spin_unlock_irqrestore(&host->lock, flags);
>
> 2. sd_get_rsp (cmd_tasklet!):
> 429 spin_lock_irqsave(&host->lock, flags);
> ...
> 506 tasklet_schedule(&host->finish_tasklet); // <--
> 507 spin_unlock_irqrestore(&host->lock, flags);
>
> 3. sd_finish_multi_rw (data_tasklet!):
> 657 spin_lock_irqsave(&host->lock, flags);
> ...
> 684 tasklet_schedule(&host->finish_tasklet); // <--
> 685 spin_unlock_irqrestore(&host->lock, flags);
>
> 4. sdmmc_request:
> 921 mutex_lock(&pcr->pcr_mutex);
> 922 spin_lock_irqsave(&host->lock, flags);
> ...
> 967 tasklet_schedule(&host->finish_tasklet); // <--
> 968 spin_unlock_irqrestore(&host->lock, flags);
>
> 5. rtsx_pci_sdmmc_drv_remove:
> 1526 spin_lock_irqsave(&host->lock, flags);
> 1527 if (host->mrq) {
> ...
> 1541 tasklet_schedule(&host->finish_tasklet); // <--
> 1542 }
> 1543 spin_unlock_irqrestore(&host->lock, flags);
> 1544
> 1545 del_timer_sync(&host->timer);
> 1546 tasklet_kill(&host->cmd_tasklet);
> 1547 tasklet_kill(&host->data_tasklet);
> 1548 tasklet_kill(&host->finish_tasklet); // <--
>
> pcr_mutex (un)locking:
> - gets locked in sdmmc_request
> - gets unlocked in sd_finish_request (finish_tasklet).
> - gets locked/unlocked in sdmmc_set_ios
> - gets locked/unlocked in sdmmc_get_ro
> - gets locked/unlocked in sdmmc_get_cd
> - gets locked/unlocked in sdmmc_switch_voltage
> - gets locked/unlocked in sdmmc_execute_tuning
>
> finish_tasklet (sd_finish_request()) gets scheduled in:
> - sd_request_timeout (under host->lock; called on timer expiration)
> - error path of sd_send_cmd
> - get_rsp (cmd_tasklet; under host->lock)
> - error path of sd_start_multi_rw (after mod timer)
> - sd_finish_multi_rw (under host->lock)
> - sdmmc_request (under host->lock in error path; without host->lock
> elsewhere)
> - rtsx_pci_sdmmc_drv_remove (under host->lock)
>
> sd_request_timeout (timer) related:
> - deleted in sd_finish_request under host->lock (also assumes pcr_mutex)
> - sd_send_cmd (timer set with 100ms timeout, not called in error path)
> - sd_start_multi_rw (timeout set to 10 HZ, called before error path is
> checked in which finish_tasklet gets scheduled)
> Note: sd_request_timeout claims+releases host->lock.
>
> If I understand it correctly, host->lock is responsible for protecting the
> realtek_pci_sdmmc structure. Why is tasklet_schedule() on the same lock?
>
> Shouldn't the above five tasklet_schedule calls be moved outside the lock?
> Will it be problematic if the same tasklet gets executed multiple times (if
> that is possible?).
>
> Does it really need that much locking? dw_mmc.c also implements pre_req, but
> uses tasklets without needing to lock anything.
>
> Kind regards,
> Peter
>
> [1]: http://www.makelinux.net/ldd3/chp-7-sect-5
>
> On Friday 18 April 2014 16:00:53 Peter Wu wrote:
>> On Wednesday 16 April 2014 09:38:44 micky_ching@...lsil.com.cn wrote:
>>> From: Micky Ching <micky_ching@...lsil.com.cn>
>>>
>>> To avoid dead lock, we need make sure host->lock is always acquire
>>> before pcr->lock. But in irq handler, we acquired pcr->lock in rtsx mfd
>>> driver, and sd_isr_done_transfer() is called during pcr->lock already
>>> acquired. Since in sd_isr_done_transfer() the only work we do is schdule
>>> tasklet, the cmd_tasklet and data_tasklet never conflict, so it is safe
>>> to remove spin_lock() here.
>>>
>>> Signed-off-by: Micky Ching <micky_ching@...lsil.com.cn>
>>> ---
>>>
>>> drivers/mmc/host/rtsx_pci_sdmmc.c | 4 +---
>>> 1 file changed, 1 insertion(+), 3 deletions(-)
>> This patch came from
>> https://lkml.kernel.org/r/534DE1D7.3000308@realsil.com.cn ("Re:
>> rtsx_pci_sdmmc lockdep splat").
>>
>> With v3.15-rc1-49-g10ec34f, I have a hung machine when inserting a SD card.
>> lockdep was not enabled for the kernel, I have not bisected yet.
>> This patch on top of that kernel version does not help (tested by
>> rmmod rtsx_pci_sdmmc and insmod the patched one).
>>
>> Console (as typed over from a picture, sorry for any typos):
>> WARNING: CPU: 1 PID: 0 at kernel/locking/mutex.c:698
>> DEBUG_LOCKS_WARN_ON(in_interrupt())
>> Modules linked in: ...
>> CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15-rc1-custom-000049-g10ec34f #5
>> Hardware name: Shuttle Inc. XS36V/XS36V, BIOS 1.11 12/18/2012
>> Call trace:
>> <IRQ> dump_stack
>> warn_slowpath_common
>> warn_slowpath_fmt
>> __mutex_unlock_slowpath
>> mutex_unlock
>> sd_finish_request [rtsx_pci_sdmmc]
>> tasklet_action
>> __do_softirq
>> irq_exit
>> smp_apic_timer_interrupt
>> apic_timer_interrupt
>> <EOI> ? cpuidle_enter_state
>> cpuidle_enter
>> cpu_startup_entry
>> start_secondary
>> ---[end trace ...]---
>> (60 seconds later:)
>> INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 1, t=18004
>> jiffies, g=3264, c=3263, q=2) INFO: Stall ended before state dump start
>>
>> I also managed to get this trace about 106 seconds later when switching TTY:
>> INFO: task kworker/... blocked for more than 120 seconds
>> Workqueue: kmmcd mm_rescan [mmc_core]
>> Call trace:
>> ? update_rq_clock.part80
>> ? internal_add_timer
>> schedule
>> schedule_preempt
>> __mutex_lock_slowpath
>> mutex_lock
>> sdmmc_request [rtsx_pci_sdmmc]
>> mmc_start_request [mmc_core]
>> __mmc_start_req [mmc_core]
>> mmc_wait_for_cmd [mmc_core]
>> ? mmc_release_host [mmc_core]
>> mmc_io_rw_direct_host [mmc_core]
>>
>> I'll try to get a lockdep kernel and text logs later, but perhaps you
>> already know the issue?
>>
>> Peter
>>
>>> diff --git a/drivers/mmc/host/rtsx_pci_sdmmc.c
>>> b/drivers/mmc/host/rtsx_pci_sdmmc.c index 453e1d4..40695e0 100644
>>> --- a/drivers/mmc/host/rtsx_pci_sdmmc.c
>>> +++ b/drivers/mmc/host/rtsx_pci_sdmmc.c
>>> @@ -108,12 +108,10 @@ static void sd_isr_done_transfer(struct
>>> platform_device *pdev) {
>>>
>>> struct realtek_pci_sdmmc *host = platform_get_drvdata(pdev);
>>>
>>> - spin_lock(&host->lock);
>>>
>>> if (host->cmd)
>>>
>>> tasklet_schedule(&host->cmd_tasklet);
>>>
>>> - if (host->data)
>>> + else if (host->data)
>>>
>>> tasklet_schedule(&host->data_tasklet);
>>>
>>> - spin_unlock(&host->lock);
>>>
>>> }
>>>
>>> static void sd_request_timeout(unsigned long host_addr)
> .
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists