[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0904052301481.747@localhost.localdomain>
Date: Sun, 5 Apr 2009 23:02:58 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Jaswinder Singh Rajput <jaswinder@...nel.org>
cc: Linus Torvalds <torvalds@...ux-foundation.org>,
John Levon <levon@...ementarian.org>,
Ingo Molnar <mingo@...e.hu>, x86 maintainers <x86@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Robert Richter <robert.richter@....com>
Subject: Re: [PATCH] x86: fix BUG: using smp_processor_id() in preemptible
[00000000] code: oprofiled/3319
On Mon, 6 Apr 2009, Jaswinder Singh Rajput wrote:
> On Sun, 2009-04-05 at 12:51 -0700, Linus Torvalds wrote:
> >
> > On Mon, 6 Apr 2009, Jaswinder Singh Rajput wrote:
> > >
> > > Fixed this bug on P4 HT machine:
> >
> > I don't think this really is correct.
> >
> > Anything that does that "get_stagger()" thing should already either be
> > tied to a specific CPU (because it's going to actually touch the variables
> > for "that CPU"), or it should likely set up the data structures for _all_
> > cases.
Most of the functions are called preempt off anyway. The only one
which needs care AFAICT is p4_fill_in_addresses, but the problem space
is outside of the p4 specific code. It's the call site which is
nmi_setup():
model->fill_in_addresses(&per_cpu(cpu_msrs, 0));
...
on_each_cpu(nmi_cpu_setup, NULL, 1);
Need to have a closer look which parts of that section need to be
preempt off.
> > Just as an example, look at something like "p4_setup_ctrs()": it will use
> > that stagger to determine what MSR to read. But if we are preemptable, the
> > CPU we actually do the MSR read on may be a _different_ CPU than the CPU
> > that we did the "get_stagger()" on!
And that code has nested calls to get_stagger() via pmc_setup_one_p4_counter().
That is odd as well. Why evaluate that twice ? So there are more
things to look at before we just fix^Whide some debug warning.
> > Of course, this is all ancient code, so whatever. But I really think this
> > patch is actively bad - it just hides the issue rather than fixing
> > anything.
And the proposed follow up patch is even worse as it just adds
mechanical band aids all over the place without analysing where the
root cause of the problem is.
Thanks,
tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists