[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071115053751.GB26338@Krystal>
Date: Thu, 15 Nov 2007 00:37:51 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: Rusty Russell <rusty@...tcorp.com.au>
Cc: akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
Andi Kleen <ak@....de>, "H. Peter Anvin" <hpa@...or.com>,
Chuck Ebbert <cebbert@...hat.com>,
Christoph Hellwig <hch@...radead.org>,
Jeremy Fitzhardinge <jeremy@...p.org>,
Ingo Molnar <mingo@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [patch 5/8] Immediate Values - x86 Optimization
* Rusty Russell (rusty@...tcorp.com.au) wrote:
> On Thursday 15 November 2007 15:06:10 Mathieu Desnoyers wrote:
> > * Rusty Russell (rusty@...tcorp.com.au) wrote:
> > > A stop_machine (or lightweight variant using IPI) would be sufficient and
> > > vastly simpler. Trying to patch NMI handlers while they're running is
> > > already crazy.
> >
> > I wouldn't mind if it was limited to the code within do_nmi(), but then
> > we would have to accept potential GPF if
> >
> > A - the NMI or MCE code calls any external kernel code (printk,
> > notify_die, spin_lock/unlock, die_nmi, lapic_wd_event (perfctr code,
> > calls printk too for debugging)...
>
> Sure, but as I pointed out previously, such calls are already best effort.
> You can do very little safely from do_nmi(), and calling printk isn't one of
> them,
yes and no.. do_nmi uses the "bust spinlocks" exactly for this. So this
is ok by design. Other than this, we can end up mixing up the console
data output with different sources of characters, but I doubt something
really bad can happen (like a deadlock).
> nor is grabbing a spinlock (well, actually you could as long as it's
> *only* used by NMI handlers. See any of those?).
Yup, see arch/x86/kernel/nmi_64.c : nmi_watchdog_tick()
It defines a spinlock to "Serialise the printks". I guess it's good to
protect against other nmi watchdogs running on other CPUs concurrently,
I guess.
>
> > Therefore, if one decides to use the immediate values to
> > leave dormant spinlock instrumentation in the kernel, I wouldn't want it
> > to have undesirable side-effects (GPF) when the instrumentation is
> > being enabled, as rare as it could be.
>
> It's overengineered, since it's less likely than deadlock already.
>
So should we put a warning telling "enabling tracing or profiling on a
production system that also uses NMI watchdog could potentially cause a
crash" ? The rarer a bug is, the more difficult it is to debug. It does
not make the bug hurt less when it happens.
The normal thing to do when a potential deadlock is detected is to fix
it, not to leave it there under the premise that it doesn't matter since
it happens rarely. In our case, where we know there is a potential race,
I don't see any reason not to make sure it never happens. What's the
cost of it ?
arch/x86/kernel/immediate.o : 2.4K
let's compare..
kernel/stop_machine.o : 3.9K
so I think that code size is not an issue there, especially since the
immediate values are not meant to be deployed on embedded systems.
> > > I'd keep this version up your sleeve for they day when it's needed.
> >
> > If we choose to go this way, stop_machine would have to do a sync_core()
> > on every CPU before it reactivates interrupts for this to respect
> > Intel's errata.
>
> Yes, I don't think stop_machine is actually what you want anyway, since you
> are happy to run in interrupt context. An IPI-based scheme is probably
> better, and also has the side effect of iret doing the sync you need, IIUC.
>
Yup, looping in IPIs with interrupts disabled should do the job there.
It's just awful for interrupt latency on large SMP systems :( Being
currently bad at it is not a reason to make it worse. If we have a CPU
that is within a high latency irq disable region when we send the IPI,
we can easily end up waiting for this critical section to end with
interrupts disabled on all CPUs. The fact that we would wait for the
longest interrupt disable region with IRQs disabled implies that we
increase the maximum latency of the system, by design. I'm not sure I
would like to be the new longest record-beating IRQ off region.
Mathieu
> Hope that clarifies,
> Rusty.
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists