[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100728122111.GO26154@erda.amd.com>
Date: Wed, 28 Jul 2010 14:21:11 +0200
From: Robert Richter <robert.richter@....com>
To: Benjamin Herrenschmidt <benh@...nel.crashing.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Carl E. Love" <cel@...ibm.com>,
Michael Ellerman <michaele@....ibm.com>
Subject: Re: Possible Oprofile crash/race when stopping
On 22.07.10 01:14:40, Benjamin Herrenschmidt wrote:
> Hi folks !
>
> We've hit a strange crash internally, that we -think- we have tracked
> down to an oprofile bug. It's hard to hit, so I can't guarantee yet that
> we have fully smashed it but I'd like to share our findings in case you
> guys have a better idea.
>
> So the initial observation is a spinlock bad magic followed by a crash
> in the spinlock debug code:
Benjamin,
thanks for reporting this. I was trying to reproduce this with various
loads and scenarios, but without success so far. Can you give me a
hint of the load you have (number of processes running, cpu load, do
you switch off oprofile while many processes are still running)? Are
you able to regularly trigger it?
> I think the right sequence however requires breaking up end_sync. Ie, we
> need to do in that order:
>
> - cancel the workqueues
> - unregister the notifier
> - process the mortuary
>
> What do you think ?
This could potentially fix it, I will have to look deeper into the
code. Try to do this next week.
Thanks,
-Robert
--
Advanced Micro Devices, Inc.
Operating System Research Center
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists