[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20140219202443.GK6835@laptop.programming.kicks-ass.net>
Date: Wed, 19 Feb 2014 21:24:43 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Stephane Eranian <eranian@...gle.com>
Cc: Will Deacon <will.deacon@....com>,
Drew Richardson <drew.richardson@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Arnaldo <acme@...hat.com>, Pawel Moll <Pawel.Moll@....com>,
Wade Cherry <Wade.Cherry@....com>
Subject: Re: Perf Oops on 3.14-rc2
On Wed, Feb 19, 2014 at 08:59:08PM +0100, Stephane Eranian wrote:
> On Wed, Feb 19, 2014 at 7:36 PM, Peter Zijlstra <peterz@...radead.org> wrote:
> > On Wed, Feb 19, 2014 at 07:03:13PM +0100, Stephane Eranian wrote:
> >> I am trying to understand the context here.
> >> Are you saying, we may call an offline CPU?
> >
> > Yes, that is what's happening.
> >
> >> I saw that sometimes you retry, sometimes you don't.
> >
> > I tried to do exactly what we do for the task case which is far more
> > likely to fail. Could be I messed up.
> >
> I am not sure why you need to retry. If the CPU is offline, it is offline.
> Or are you saying, you get an error, but you don't know the exact
> reason, thus you keep trying? But how do you get out of this if
> the CPU stays offline?
Ah, so take perf_remove_from_context() as before the patch; if the
cpu_function_call() fails because the CPU is offline, it doesn't call
list_del_event().
Now the offline function is supposed to take them off the list, but it
doesn't actually in case they're grouped.
This leaves a free()d event on the offline cpu's context list.
After that things quickly go downwards.
But before I got there I was led down a few too many rabbit holes trying
to figure out wtf happened.
We could probably fix it differently though. But by the time I more or
less understood things I was too tired to make something pretty.
Anyway; if you get to do something if cpu_function_call() fails; you
have to also check if it got back up since you tried; at which point
you've got the same pattern as we have for task_function_call().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists