[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.1.10.0811131349090.31899@gandalf.stny.rr.com>
Date: Thu, 13 Nov 2008 14:03:56 -0500 (EST)
From: Steven Rostedt <rostedt@...dmis.org>
To: Andi Kleen <andi@...stfloor.org>
cc: Frédéric Weisbecker <fweisbec@...il.com>,
Ingo Molnar <mingo@...e.hu>,
Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] tracing/function-return-tracer: Make the function
return tracer lockless
On Thu, 13 Nov 2008, Andi Kleen wrote:
>>>
>>> I did not refer to CPU caches, but the compiler's register allocation
>>> [ok if you want the registers are the "level 0 cache"]. A memory barrier
>>> all messes it up. That is why it is better to only clobber specific
>>> memory regions, which is what local_* does.
>>>
>> Now tell me again how local_* is more efficient than barrier?
>
> Look at the generated assembler, not the source. The M386 check is two instructions
> with the i386 stuff just moved away somewhere out of line that is
> about never taken. The rest is short and straight forward in asm too.
> And a lot of kernels are not compiled with 386 support anyways.
>
OK, lets look at just the asm.
barrier:
__asm__ __volatile__("": : :"memory")
local_add_return:
asm volatile(_ASM_XADD "%0, %1;"
: "+r" (i), "+m" (l->a.counter)
: : "memory");
Your argument was that barrier clobbers memory, but it looks like the
local_add_return does the same.
Lets go look at the code that we are talking about:
index = ++ti->curr_ret_stack;
ti->ret_stack[index].ret = ret;
ti->ret_stack[index].func = func;
ti->ret_stack[index].calltime = time;
Which as you stated was incorrect, and it is. But you suggested that we
should switch to using local_add_return which would cause us to change a
lot of code to handle the type change. When this is ported to other archs,
they will copy that too, and then be using a lesser efficient algorithm.
All that is needed is to change the pointer to always point to the next
item, and then do:
index = ti->curr_ret_stack++;
barrier();
ti->ret_stack[index].ret = ret;
ti->ret_stack[index].func = func;
ti->ret_stack[index].calltime = time;
I don't see any major gain in switching to local_add_return.
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists