linux-kernel - Re: [PATCH 1/2] tracing/function-return-tracer: Make the function return tracer lockless

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.1.10.0811131349090.31899@gandalf.stny.rr.com>
Date:	Thu, 13 Nov 2008 14:03:56 -0500 (EST)
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Andi Kleen <andi@...stfloor.org>
cc:	Frédéric Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...e.hu>,
	Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] tracing/function-return-tracer: Make the function
 return tracer lockless


On Thu, 13 Nov 2008, Andi Kleen wrote:
>>> 
>>> I did not refer to CPU caches, but the compiler's register allocation
>>> [ok if you want the registers are the "level 0 cache"]. A memory barrier
>>> all messes it up. That is why it is better to only clobber specific
>>> memory regions, which is what local_* does.
>>> 

>> Now tell me again how local_* is more efficient than barrier?
> 
> Look at the generated assembler, not the source. The M386 check is two instructions
> with the i386 stuff just moved away somewhere out of line that is
> about never taken. The rest is short and straight forward in asm too.
> And a lot of kernels are not compiled with 386 support anyways.
> 

OK, lets look at just the asm.

barrier:
	__asm__ __volatile__("": : :"memory")

local_add_return:
	asm volatile(_ASM_XADD "%0, %1;"
		     : "+r" (i), "+m" (l->a.counter)
		     : : "memory");


Your argument was that barrier clobbers memory, but it looks like the 
local_add_return does the same.

Lets go look at the code that we are talking about:

 	index = ++ti->curr_ret_stack;
 	ti->ret_stack[index].ret = ret;
 	ti->ret_stack[index].func = func;
 	ti->ret_stack[index].calltime = time;

Which as you stated was incorrect, and it is. But you suggested that we 
should switch to using local_add_return which would cause us to change a 
lot of code to handle the type change. When this is ported to other archs, 
they will copy that too, and then be using a lesser efficient algorithm.

All that is needed is to change the pointer to always point to the next 
item, and then do:

        index = ti->curr_ret_stack++;
	barrier();
        ti->ret_stack[index].ret = ret;
        ti->ret_stack[index].func = func;
        ti->ret_stack[index].calltime = time;

I don't see any major gain in switching to local_add_return.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/