linux-kernel - Re: [bug] sound: INFO: possible recursive locking detected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <s5hprix68ok.wl%tiwai@suse.de>
Date:	Thu, 08 Jan 2009 17:53:31 +0100
From:	Takashi Iwai <tiwai@...e.de>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	linux-kernel@...r.kernel.org, Wu Fengguang <wfg@...ux.intel.com>
Subject: Re: [bug] sound: INFO: possible recursive locking detected

At Thu, 08 Jan 2009 17:17:06 +0100,
I wrote:
> 
> At Thu, 08 Jan 2009 17:12:54 +0100,
> Peter Zijlstra wrote:
> > 
> > On Thu, 2009-01-08 at 17:10 +0100, Takashi Iwai wrote:
> > > At Thu, 08 Jan 2009 16:39:31 +0100,
> > > I wrote:
> > > > 
> > > > At Thu, 8 Jan 2009 16:20:37 +0100,
> > > > Ingo Molnar wrote:
> > > > > 
> > > > > FYI, -tip testing found this new lockdep workqueue locking assert 
> > > > > triggered by the Intel HDA sound driver:
> > > (snip)
> > > > > [   30.718689] =============================================
> > > > > [   30.718689] [ INFO: possible recursive locking detected ]
> > > > > [   30.718689] 2.6.28-tip #2
> > > > > [   30.718689] ---------------------------------------------
> > > > > [   30.718689] events/0/7 is trying to acquire lock:
> > > > > [   30.718689]  (events){--..}, at: [<ffffffff80281050>] flush_workqueue+0x0/0x90
> > > > > [   30.718689] 
> > > > > [   30.718689] but task is already holding lock:
> > > > > [   30.718689]  (events){--..}, at: [<ffffffff8028076a>] run_workqueue+0x14a/0x240
> > > > > [   30.718689] 
> > > > > [   30.718689] other info that might help us debug this:
> > > > > [   30.718689] 2 locks held by events/0/7:
> > > > > [   30.718689]  #0:  (events){--..}, at: [<ffffffff8028076a>] run_workqueue+0x14a/0x240
> > > > > [   30.718689]  #1:  (&wfc.work){--..}, at: [<ffffffff8028076a>] run_workqueue+0x14a/0x240
> > > > > [   30.718689] 
> > > > > [   30.718689] stack backtrace:
> > > > > [   30.718689] Pid: 7, comm: events/0 Not tainted 2.6.28-tip #2
> > > > > [   30.718689] Call Trace:
> > > > > [   30.718689]  [<ffffffff80295836>] validate_chain+0xc56/0x1340
> > > > > [   30.718689]  [<ffffffff802974e6>] __lock_acquire+0x376/0x610
> > > > > [   30.718689]  [<ffffffff80297819>] lock_acquire+0x99/0xd0
> > > > > [   30.718689]  [<ffffffff80281050>] ? flush_workqueue+0x0/0x90
> > > > > [   30.718689]  [<ffffffff80281091>] flush_workqueue+0x41/0x90
> > > > > [   30.718689]  [<ffffffff80281050>] ? flush_workqueue+0x0/0x90
> > > > > [   30.718689]  [<ffffffff802810f5>] flush_scheduled_work+0x15/0x20
> > > > > [   30.718689]  [<ffffffff80c59794>] azx_clear_irq_pending+0x54/0x60
> > > > > [   30.718689]  [<ffffffff80c599ec>] azx_free+0x10c/0x160
> > > > > [   30.718689]  [<ffffffff80bae9ad>] ? snd_device_free+0x8d/0x100
> > > > > [   30.718689]  [<ffffffff80c59a52>] azx_dev_free+0x12/0x20
> > > > > [   30.718689]  [<ffffffff80bae9a1>] snd_device_free+0x81/0x100
> > > > > [   30.718689]  [<ffffffff80baea89>] snd_device_free_all+0x69/0xa0
> > > > > [   30.718689]  [<ffffffff80ba8a70>] snd_card_do_free+0x50/0xd0
> > > > > [   30.718689]  [<ffffffff80ba9602>] snd_card_free+0xa2/0xc0
> > > > > [   30.718689]  [<ffffffff8061115f>] ? __delay+0xf/0x20
> > > > > [   30.718689]  [<ffffffff80edbb6a>] azx_probe+0x38a/0x9f0
> > > > > [   30.718689]  [<ffffffff80c5a240>] ? azx_send_cmd+0x0/0x130
> > > > > [   30.718689]  [<ffffffff80c5a370>] ? azx_get_response+0x0/0x210
> > > > > [   30.718689]  [<ffffffff80c59b70>] ? azx_attach_pcm_stream+0x0/0x1b0
> > > > > [   30.718689]  [<ffffffff802804d0>] ? do_work_for_cpu+0x0/0x30
> > > > > [   30.718689]  [<ffffffff80634807>] local_pci_probe+0x17/0x20
> > > > > [   30.718689]  [<ffffffff802804e8>] do_work_for_cpu+0x18/0x30
> > > > > [   30.718689]  [<ffffffff802804d0>] ? do_work_for_cpu+0x0/0x30
> > > > > [   30.718689]  [<ffffffff802807bb>] run_workqueue+0x19b/0x240
> > > > > [   30.718689]  [<ffffffff8028076a>] ? run_workqueue+0x14a/0x240
> > > > > [   30.718689]  [<ffffffff802809cf>] worker_thread+0xaf/0x110
> > > > > [   30.718689]  [<ffffffff80284e20>] ? autoremove_wake_function+0x0/0x40
> > > > > [   30.718689]  [<ffffffff80280920>] ? worker_thread+0x0/0x110
> > > > > [   30.718689]  [<ffffffff802848e3>] kthread+0x53/0x80
> > > > > [   30.718689]  [<ffffffff8022c3aa>] child_rip+0xa/0x20
> > > > > [   30.718689]  [<ffffffff8022bcbe>] ? restore_args+0x0/0x30
> > > > > [   30.718689]  [<ffffffff80284890>] ? kthread+0x0/0x80
> > > > > [   30.718689]  [<ffffffff8022c3a0>] ? child_rip+0x0/0x20
> > > 
> > > The backtrace implies that the probe callback is called in a
> > > workqueue.  Is it right?
> > 
> > Yes, it appears a worklet tries to flush the queue he's on.
> 
> It calls flush_schedule_work().
> So, now this should be avoided during the probe callback?
> 
> If so, a simple workaround would be to create its own workqueue, or
> add a flag to avoid this call at destructor...

The below is a quick fix to convert to use the own workq in
hda_intel.c.  This should fix this particular lockdep problem, at
least.  (And it's better, anyway.)

Please give it a try.


thanks,

Takashi

---
diff --git a/sound/pci/hda/hda_intel.c b/sound/pci/hda/hda_intel.c
index f04de11..7e09f98 100644
--- a/sound/pci/hda/hda_intel.c
+++ b/sound/pci/hda/hda_intel.c
@@ -406,6 +406,7 @@ struct azx {
 	unsigned int last_cmd;	/* last issued command (to sync) */
 
 	/* for pending irqs */
+	struct workqueue_struct *workq;
 	struct work_struct irq_pending_work;
 
 	/* reboot notifier (for mysterious hangup problem at power-down) */
@@ -999,7 +1000,8 @@ static irqreturn_t azx_interrupt(int irq, void *dev_id)
 			} else {
 				/* bogus IRQ, process it later */
 				azx_dev->irq_pending = 1;
-				schedule_work(&chip->irq_pending_work);
+				queue_work(chip->workq,
+					   &chip->irq_pending_work);
 			}
 		}
 	}
@@ -1741,7 +1743,7 @@ static void azx_clear_irq_pending(struct azx *chip)
 	for (i = 0; i < chip->num_streams; i++)
 		chip->azx_dev[i].irq_pending = 0;
 	spin_unlock_irq(&chip->reg_lock);
-	flush_scheduled_work();
+	flush_workqueue(chip->workq);
 }
 
 static struct snd_pcm_ops azx_pcm_ops = {
@@ -2019,6 +2021,8 @@ static int azx_free(struct azx *chip)
 		azx_stop_chip(chip);
 	}
 
+	if (chip->workq)
+		destroy_workqueue(chip->workq);
 	if (chip->irq >= 0)
 		free_irq(chip->irq, (void*)chip);
 	if (chip->msi)
@@ -2131,6 +2135,7 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
 	static struct snd_device_ops ops = {
 		.dev_free = azx_dev_free,
 	};
+	char qname[8];
 
 	*rchip = NULL;
 
@@ -2153,7 +2158,6 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
 	chip->driver_type = driver_type;
 	chip->msi = enable_msi;
 	chip->dev_index = dev;
-	INIT_WORK(&chip->irq_pending_work, azx_irq_pending_work);
 
 	chip->position_fix = check_position_fix(chip, position_fix[dev]);
 	check_probe_mask(chip, dev);
@@ -2200,6 +2204,15 @@ static int __devinit azx_create(struct snd_card *card, struct pci_dev *pci,
 		if (pci_enable_msi(pci) < 0)
 			chip->msi = 0;
 
+	snprintf(qname, sizeof(qname), "hda%d", chip->card->number);
+	chip->workq = create_workqueue(qname);
+	if (!chip->workq) {
+		snd_printk(KERN_ERR SFX "cannot create workqueue %s\n", qname);
+		err = -ENOMEM;
+		goto errout;
+	}
+	INIT_WORK(&chip->irq_pending_work, azx_irq_pending_work);
+
 	if (azx_acquire_irq(chip, 0) < 0) {
 		err = -EBUSY;
 		goto errout;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/