linux-kernel - Re: 2.6.29-rc1: BUG: unable to handle kernel paging request at 000077fed87bb6bf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-id: <20090119212329.GS28831@clouds>
Date:	Mon, 19 Jan 2009 16:23:29 -0500
From:	Jody McIntyre <scjody@....com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: 2.6.29-rc1: BUG: unable to handle kernel paging request at
	000077fed87bb6bf

On Wed, Jan 14, 2009 at 09:51:20PM -0800, Andrew Morton wrote:
> It needs to be bisected :(
> 
> At a guess I'd say that some piece of code somewhere did
> schedule_delayed_work(work) and then scribbled on *work before the
> timer got to run.  It could be anywhere at all in your kernel.

I tried your patch on Friday and couldn't find the bug with it.  I was
going to try more debugging today but happily, a commit made over the
weekend has fixed the issue.

Thanks,
Jody

> Thomas's debugobjects code could perhaps be used to find the culprit,
> but alas the workqueue code hasn't been taught to use that framework.
> 
> We could do something like this:
> 
> --- a/kernel/workqueue.c~a
> +++ a/kernel/workqueue.c
> @@ -196,9 +196,14 @@ EXPORT_SYMBOL_GPL(queue_work_on);
>  
>  static void delayed_work_timer_fn(unsigned long __data)
>  {
> -	struct delayed_work *dwork = (struct delayed_work *)__data;
> -	struct cpu_workqueue_struct *cwq = get_wq_data(&dwork->work);
> -	struct workqueue_struct *wq = cwq->wq;
> +	struct delayed_work *dwork;
> +	struct cpu_workqueue_struct *cwq;
> +	struct workqueue_struct *wq;
> +
> +	dwork = (struct delayed_work *)__data;
> +	printk("delayed_work_timer_fn: handling %p\n", dwork);
> +	cwq = get_wq_data(&dwork->work);
> +	wq = cwq->wq;
>  
>  	__queue_work(wq_per_cpu(wq, smp_processor_id()), &dwork->work);
>  }
> @@ -214,6 +219,8 @@ static void delayed_work_timer_fn(unsign
>  int queue_delayed_work(struct workqueue_struct *wq,
>  			struct delayed_work *dwork, unsigned long delay)
>  {
> +	printk("queue_delayed_work: queueing %p\n", dwork);
> +	dump_stack();
>  	if (delay == 0)
>  		return queue_work(wq, &dwork->work);
>  
> _
> 
> It will print an object address and a call trace in
> queue_delayed_work() and will print an object address in
> delayed_work_timer_fn().
> 
> Hopefully when it oopses you'll see the bad object address which was
> obtained by delayed_work_timer_fn().  Then, you can go back through the
> trace and determine which callsite passed that object address to
> queue_delayed_work().  Then we will have our culprit.
> 
> It'll possibly generate a lot of output, so logged netconsole might be
> needed.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/