linux-kernel - Re: Unified tracing buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080923032707.GJ24937@Krystal>
Date:	Mon, 22 Sep 2008 23:27:07 -0400
From:	Mathieu Desnoyers <compudj@...stal.dyndns.org>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	"Frank Ch. Eigler" <fche@...hat.com>,
	Martin Bligh <mbligh@...gle.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>, od@...ell.com
Subject: Re: Unified tracing buffer

* Steven Rostedt (rostedt@...dmis.org) wrote:
> 
> On Mon, 22 Sep 2008, Mathieu Desnoyers wrote:
> > * Steven Rostedt (rostedt@...dmis.org) wrote:
[...]
> > > and then to turn on function tracing, I need to hook into this marker. I'd 
> > > rather just push the data right into the buffer here without having to 
> > > make another function call to hook into this.
> > > 
> > 
> > The scheme you propose here is based on a few inherent assumptions :
> > 
> > - You assume ring_buffer_reserve() and ring_buffer_commit() are static
> >   inline and thus does not turn into function calls.
> > - You assume these are small enough so they can be inlined without
> >   causing L1 insn cache trashing when tracing is activated.
> > - You therefore assume they use a locking scheme that lets them be
> >   really really compact (e.g. interrupt disable and spin lock).
> > - You assume that the performance impact of doing a function call is
> >   bigger than the impact of locking, which is false by at least a factor
> >   10.
> 
> I don't assume anything. I will have the requirement that reserve and 
> commit must be paired, and for the first version, hold locks.
> 

By saying you don't want to do any function call, the only technical
reason I see for you wanting that is performance, and thus you would
assume the above. If not, why don't you want to make another function
call ? This all I mean by "assumption" here.

> Maybe I should rename it to: ring_buffer_lock_reserve and 
> ring_buffer_unlock_commit. To show this.
> 
> [...]
> 
> > kernel/trace/ftrace.c :
> > 
> > /*
> >  * the following macro would only do the "declaration" part of the
> >  * markers, without doing all the function call stuff.
> >  */
> > DECLARE_MARKER(function_entry,
> >   "pid %d pc %d flags %lu func 0x%lX parent 0x%lX");
> > 
> > void ftrace_mcount(unsigned long ip, unsigned long parent_ip)
> > {
> >   size_t ev_size = 0;
> >   char *buffer;
> > 
> >   /*
> >    * We assume event payload aligned on sizeof(void *).
> >    * Event size calculated statically.
> >    */
> >   ev_size += sizeof(int);
> >   ev_size += var_align(ev_size, sizeof(int));
> >   ev_size += sizeof(int);
> >   ev_size += var_align(ev_size, sizeof(unsigned long));
> >   ev_size += sizeof(unsigned long);
> >   ev_size += var_align(ev_size, sizeof(unsigned long));
> >   ev_size += sizeof(unsigned long);
> >   ev_size += var_align(ev_size, sizeof(unsigned long));
> >   ev_size += sizeof(unsigned long);
> > 
> >   /*
> >    * Now reserve space and copy data.
> >    */
> >   buffer = ring_buffer_reserve(func_event_id, ev_size);
> >   /* Write pid */
> >   *(int *)buffer = current->pid;
> >   buffer += sizeof(int);
> > 
> >   /* Write pc */
> >   buffer += var_align(buffer, sizeof(int));
> >   *(int *)buffer = preempt_count();
> >   buffer += sizeof(int);
> > 
> >   /* Write flags */
> >   buffer += var_align(buffer, sizeof(unsigned long));
> >   *(unsigned long *)buffer = local_irq_flags();
> >   buffer += sizeof(unsigned long);
> > 
> >   /* Write func */
> >   buffer += var_align(buffer, sizeof(unsigned long));
> >   *(unsigned long *)buffer = func;
> >   buffer += sizeof(unsigned long);
> > 
> >   /* Write parent */
> >   buffer += var_align(buffer, sizeof(unsigned long));
> >   *(unsigned long *)buffer = parent;
> >   buffer += sizeof(unsigned long);
> > 
> >   ring_buffer_commit(buffer, ev_size);
> > }
> > 
> > 
> > Would that be suitable for you ?
> 
> YUCK YUCK YUCK!!!!
> 
> Mathieu,
> 
> Do I have to bring up the argument of simplicity again? I will never use
> such an API.  Mine was very simple, I have to spend 10 minutes trying to
> figure out what the above is. I only spent 5 so I'm still at a lost.
> 

I was actually waiting for you to propose an alternative, but I fear you
already did without me noticing :)

How do you deal with exporting data across kernel/user boundary in your
proposal exactly ? How does this work on architecture with 64-bits
kernel and 32-bits userland... ? A simple C structure copy might be
simple to _code_, but hellish to export to userspace and lead to hard to
debug binary incompatibilities (different gcc flags, 32/64 bits
user/kernel). And this is without telling about the non-portability of
the exported data.

If gcc/icc-knowledgeful people can reassure me by certifying it won't
generate a mess, fine, but until then, I stay very doubtful about
solutions involving to imply binary compability between kernel and
userland.

And common.. 10 minutes to understand the above code. Your _are_ kidding
me right ? Would that help if I create a small 4 lineish wrapper around
the buffer write ?

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/