[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080730140430.GB18210@Krystal>
Date: Wed, 30 Jul 2008 10:04:30 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Steven Rostedt <rostedt@...dmis.org>, Ingo Molnar <mingo@...e.hu>,
Thomas Gleixner <tglx@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, Linus Torvalds <torvalds@...l.org>,
Arjan van de Ven <arjan@...radead.org>
Subject: Re: [PATCH 0/5] ftrace: do not trace NMI handlers
* Peter Zijlstra (peterz@...radead.org) wrote:
> On Tue, 2008-07-29 at 21:29 -0400, Steven Rostedt wrote:
> > The dynamic ftrace code modifies code text at run time. Arjan informed
> > me that there is no safe way to modify code text on an SMP system when
> > the other CPUs might execute that same code. The reason has to do with
> > pipeline caches and CPUs might do funny things if the code being pushed
> > in the pipeline also happens to be modified at that same time. (Arjan
> > correct me if I'm wrong here).
>
> How does the immediate value stuff get around this issue?
>
By inserting a temporary breakpoint in the code about to be replaced and
by issuing an IPI to every CPU, issuing a cpuid instruction to
synchronize the cores, before reverting the breakpoint to the first byte
of the new instruction. Therefore we are sure that any CPU which could
have seen the old instruction will flush its pipeline cache (cpuid
instruction synchronizes the core) before seeing the new instruction.
Meanwhile, the breakpoint handler executes the original instruction,
dealing with non-relocatable instructions if necessary by emulating them
(e.g. by playing with the return IP). We have some constraints to
follow: every instruction being replaced must stay a valid instruction
to the CPU, because there could be preemption in the middle of the
modified area. Immediate values does not change the size of the
instructions, so we don't run in this problem, but changing
{ 0x90, 0x90, 0x90, 0x90, 0x90 } (nops)
into
{ 0xe9, 0xXX, 0xXX, 0xXX, 0xXX } (5-bytes jump)
would be problematic because a preempted IP could point right in the
middle of those nops. As a general rule, never try to combine smaller
instructions into a bigger one, except in the case of adding a
lock-prefix to an instruction : this case insures that the non-lock
prefixed instruction is still valid after the change has been done. We
could however run into a nasty non-synchronized atomic instruction use
in SMP mode if a thread happens to be scheduled out right after the lock
prefix. Hopefully the alternative code uses the refrigerator... (hrm, it
doesn't).
All that is documentented thoroughly in arch/x86/kernel/immediate.c
comments.
Mathieu
> > We use kstop_machine to put the system into a UP like mode. This prevents
> > other CPUs from executing code while we modify it. Under stress testing
> > Ingo discovered that NMIs can cause the system to crash. This was due
> > to NMIs calling code that is being modified. Some boxes are more prone to
> > failure than others.
> >
> > This series of patches performs two tasks:
> >
> > 1) Add notrace to functions called by NMI, or simply remove the tracing
> > completely from files that are primarily used by NMI.
> >
> > 2) Add a warning when code that will be modified is called by an NMI.
> > This also disables ftraced when it is detected, to prevent the
> > race with the NMI and code modification from happneing.
> >
> > The warning looks something like this:
> >
> > --------------- cut here ---------------
> > WARNING: ftraced code called from NMI context lapic_wd_event+0xd/0x65
> > Please report this to the ftrace maintainer.
> > Disabling ftraced. Boot with ftrace_keep_on_nmi to not disable.
> > Pid: 0, comm: swapper Not tainted 2.6.26-tip #96
> >
> > Call Trace:
> > <NMI> [<ffffffff8021c6d0>] ? lapic_wd_event+0xd/0x65
> > [<ffffffff8027b9c1>] ftrace_record_ip+0xa3/0x357
> > [<ffffffff8020c0f4>] mcount_call+0x5/0x31
> > [<ffffffff8021c6d5>] ? lapic_wd_event+0x12/0x65
> > [<ffffffff804b90d4>] nmi_watchdog_tick+0x21b/0x230
> > [<ffffffff804b8487>] default_do_nmi+0x73/0x1e0
> > [<ffffffff804b8a04>] do_nmi+0x64/0x91
> > [<ffffffff804b80bf>] nmi+0x7f/0x80
> > [<ffffffff80212c14>] ? default_idle+0x35/0x4f
> > <<EOE>> [<ffffffff8020ae42>] cpu_idle+0x8a/0xc9
> > [<ffffffff804b15a6>] start_secondary+0x172/0x177
> >
> > --------------- end cut here ---------------
> >
> >
> > This appears once when it is caught. We are hoping that this will not
> > appear often, and are running code to catch it as it does.
> >
> > -- Steve
> >
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists