linux-kernel - Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <200911112100.16561.rjw@sisk.pl>
Date:	Wed, 11 Nov 2009 21:00:16 +0100
From:	"Rafael J. Wysocki" <rjw@...k.pl>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Mike Galbraith <efault@....de>, Ingo Molnar <mingo@...e.hu>,
	LKML <linux-kernel@...r.kernel.org>,
	pm list <linux-pm@...ts.linux-foundation.org>,
	Greg KH <gregkh@...e.de>,
	Jesse Barnes <jbarnes@...tuousgeek.org>,
	Tejun Heo <tj@...nel.org>,
	Marcel Holtmann <marcel@...tmann.org>,
	linux-bluetooth@...r.kernel.org
Subject: Re: GPF in run_workqueue()/list_del_init(cwq->worklist.next) on resume (was: Re: Help needed: Resume problems in 2.6.32-rc, perhaps related to preempt_count leakage in keventd)

On Wednesday 11 November 2009, Oleg Nesterov wrote:
> On 11/10, Linus Torvalds wrote:
> >
> > > In the meantime I got another trace, this time with a slab corruption involved.
> > > Note that it crashed in exactly the same place as previously.
> >
> > I'm leaving your crash log appended for the new cc's, and I would not be
> > at all surprised to hear that the slab corruption is related. The whole
> > 6b6b6b6b pattern does imply a use-after-free on the workqueue,
> 
> Yes, RCX = 6b6b6b6b6b6b6b6b, and according to decodecode the faulting
> instruction is "mov %rdx,0x8(%rcx)". Looks like the pending work was
> freed.
> 
> Rafael, could you reproduce the problem with the debugging patch below?
> It tries to detect the case when the pending work was corrupted and
> prints its work->func (saved in the previous item). It should work
> if the work_struct was freed and poisoned, or if it was re-initialized.
> See ck_work().

I applied the patch and this is the result of 'dmesg | grep ERR' after 10-or-so
consecutive suspend-resume and hibernate-resume cycles:

[  129.008689] ERR!! btusb_waker+0x0/0x27 [btusb]
[  166.477373] ERR!! btusb_waker+0x0/0x27 [btusb]
[  203.983665] ERR!! btusb_waker+0x0/0x27 [btusb]
[  241.636547] ERR!! btusb_waker+0x0/0x27 [btusb]

which kind of confirms my previous observation that the problem was not
reproducible without Bluetooth.

So, it looks like the bug is in btusb_destruct(), which should call
cancel_work_sync() on data->waker before freeing 'data'.  I guess it should
do the same for data->work.

I'm going to test the appended patch, then.

Thanks,
Rafael

---
 drivers/bluetooth/btusb.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6/drivers/bluetooth/btusb.c
===================================================================
--- linux-2.6.orig/drivers/bluetooth/btusb.c
+++ linux-2.6/drivers/bluetooth/btusb.c
@@ -738,6 +738,9 @@ static void btusb_destruct(struct hci_de
 
 	BT_DBG("%s", hdev->name);
 
+	cancel_work_sync(&data->work);
+	cancel_work_sync(&data->waker);
+
 	kfree(data);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/