[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1281910924.2811.0.camel@pasglop>
Date: Mon, 16 Aug 2010 08:22:04 +1000
From: Benjamin Herrenschmidt <benh@...nel.crashing.org>
To: Robert Richter <robert.richter@....com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Carl E. Love" <cel@...ibm.com>,
Michael Ellerman <michaele@....ibm.com>,
oprofile-list <oprofile-list@...ts.sourceforge.net>
Subject: Re: [PATCH] oprofile: fix crash when accessing freed task structs
On Fri, 2010-08-13 at 17:39 +0200, Robert Richter wrote:
> On 02.08.10 21:39:33, Benjamin Herrenschmidt wrote:
>
> > I can't tell that much about the workload, I don't have access to it
> > either, let's say that from my point of view it's a "customer" binary
> > blob.
> >
> > I can re-trigger it though.
>
> Benjamin,
>
> can you try the patch below?
Thanks. I'll see if the folks who have a repro-case can give it a spin
for me.
Cheers,
Ben.
> Thanks,
>
> -Robert
>
> >From 4435322debc38097e9e863e14597ab3f78814d14 Mon Sep 17 00:00:00 2001
> From: Robert Richter <robert.richter@....com>
> Date: Fri, 13 Aug 2010 16:29:04 +0200
> Subject: [PATCH] oprofile: fix crash when accessing freed task structs
>
> This patch fixes a crash during shutdown reported below. The crash is
> caused be accessing already freed task structs. The fix changes the
> order for registering and unregistering notifier callbacks.
>
> All notifiers must be initialized before buffers start working. To
> stop buffer synchronization we cancel all workqueues, unregister the
> notifier callback and then flush all buffers. After all of this we
> finally can free all tasks listed.
>
> This should avoid accessing freed tasks.
>
> On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:
>
> > So the initial observation is a spinlock bad magic followed by a crash
> > in the spinlock debug code:
> >
> > [ 1541.586531] BUG: spinlock bad magic on CPU#5, events/5/136
> > [ 1541.597564] Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6d03
> >
> > Backtrace looks like:
> >
> > spin_bug+0x74/0xd4
> > ._raw_spin_lock+0x48/0x184
> > ._spin_lock+0x10/0x24
> > .get_task_mm+0x28/0x8c
> > .sync_buffer+0x1b4/0x598
> > .wq_sync_buffer+0xa0/0xdc
> > .worker_thread+0x1d8/0x2a8
> > .kthread+0xa8/0xb4
> > .kernel_thread+0x54/0x70
> >
> > So we are accessing a freed task struct in the work queue when
> > processing the samples.
>
> Reported-by: Benjamin Herrenschmidt <benh@...nel.crashing.org>
> Signed-off-by: Robert Richter <robert.richter@....com>
> ---
> drivers/oprofile/buffer_sync.c | 27 ++++++++++++++-------------
> drivers/oprofile/cpu_buffer.c | 2 --
> 2 files changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c
> index a9352b2..b7e755f 100644
> --- a/drivers/oprofile/buffer_sync.c
> +++ b/drivers/oprofile/buffer_sync.c
> @@ -141,16 +141,6 @@ static struct notifier_block module_load_nb = {
> .notifier_call = module_load_notify,
> };
>
> -
> -static void end_sync(void)
> -{
> - end_cpu_work();
> - /* make sure we don't leak task structs */
> - process_task_mortuary();
> - process_task_mortuary();
> -}
> -
> -
> int sync_start(void)
> {
> int err;
> @@ -158,7 +148,7 @@ int sync_start(void)
> if (!zalloc_cpumask_var(&marked_cpus, GFP_KERNEL))
> return -ENOMEM;
>
> - start_cpu_work();
> + mutex_lock(&buffer_mutex);
>
> err = task_handoff_register(&task_free_nb);
> if (err)
> @@ -173,7 +163,10 @@ int sync_start(void)
> if (err)
> goto out4;
>
> + start_cpu_work();
> +
> out:
> + mutex_unlock(&buffer_mutex);
> return err;
> out4:
> profile_event_unregister(PROFILE_MUNMAP, &munmap_nb);
> @@ -182,7 +175,6 @@ out3:
> out2:
> task_handoff_unregister(&task_free_nb);
> out1:
> - end_sync();
> free_cpumask_var(marked_cpus);
> goto out;
> }
> @@ -190,11 +182,20 @@ out1:
>
> void sync_stop(void)
> {
> + /* flush buffers */
> + mutex_lock(&buffer_mutex);
> + end_cpu_work();
> unregister_module_notifier(&module_load_nb);
> profile_event_unregister(PROFILE_MUNMAP, &munmap_nb);
> profile_event_unregister(PROFILE_TASK_EXIT, &task_exit_nb);
> task_handoff_unregister(&task_free_nb);
> - end_sync();
> + mutex_unlock(&buffer_mutex);
> + flush_scheduled_work();
> +
> + /* make sure we don't leak task structs */
> + process_task_mortuary();
> + process_task_mortuary();
> +
> free_cpumask_var(marked_cpus);
> }
>
> diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c
> index 219f79e..f179ac2 100644
> --- a/drivers/oprofile/cpu_buffer.c
> +++ b/drivers/oprofile/cpu_buffer.c
> @@ -120,8 +120,6 @@ void end_cpu_work(void)
>
> cancel_delayed_work(&b->work);
> }
> -
> - flush_scheduled_work();
> }
>
> /*
> --
> 1.7.1.1
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists