netdev - Re: Receive side performance issue with multi-10-GigE and NUMA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090826234650.GE6759@nowhere>
Date:	Thu, 27 Aug 2009 01:46:53 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Neil Horman <nhorman@...driver.com>
Cc:	Ingo Molnar <mingo@...e.hu>, David Miller <davem@...emloft.net>,
	rostedt@...dmis.org, billfink@...dspring.com,
	netdev@...r.kernel.org, brice@...i.com, gallatin@...i.com
Subject: Re: Receive side performance issue with multi-10-GigE and NUMA

On Wed, Aug 26, 2009 at 04:23:44PM -0400, Neil Horman wrote:
> On Wed, Aug 26, 2009 at 09:48:35PM +0200, Ingo Molnar wrote:
> > 
> > * David Miller <davem@...emloft.net> wrote:
> > 
> > > From: Ingo Molnar <mingo@...e.hu>
> > > Date: Wed, 26 Aug 2009 21:08:30 +0200
> > > 
> > > > Sigh, no. Please re-read the past discussions about this. 
> > > > trace_skb_sources.c is a hack and should be converted to generic 
> > > > tracepoints. Is there anything in it that cannot be expressed in 
> > > > terms of TRACE_EVENT()?
> > > 
> > > Neil explained why he needed to implement it this way in his reply 
> > > to Steven Rostedt.  I attach it here for your convenience.
> > 
> > thanks. The argument is invalid:
> > 
> Just because you assert that doesn't make it so, Ingo.
> 
> > > > BTW, why not just do this as events? Or was this just a easy way 
> > > > to communicate with the user space tools?
> > > 
> > > Thats exactly why I did it.  the idea is for me to now write a 
> > > user space tool that lets me analyze the events and ajust process 
> > > scheduling to optimize the rx path. Neil
> > 
> > All tooling (in fact _more_ tooling) can be done based on generic, 
> > TRACE_EVENT() based tracepoints. Generic tracepoints are far more 
> > available, have a generalized format with format parsers and user 
> > tooling implemented, etc. etc.
> > 
> Then why allow for ftrace modules at all?

Well, the old way to implement a tracer was done as you did: create
a whole ftrace plugin (ie: a tracer).

But it's a bit of a burden to implement a tracer: you have to deal
with ring buffer directly using code that is pretty the same from
a trivial tracer to another, you have to deal with output formatting,
define explicitely your fields, their types, their format separately
if you want the filters to be supported.

Oh and you also need to handle your tracepoints by hand, check their
registration results. You also need to implement by your stop and start
callbacks that deactivate your tracepoints.

So that's a lot of repetitive and error-prone work.
Also kernel/trace hosts a lot of such error-prone code and it doesn't
only become a due diligence of maintainance from you but also for us.

The goal of the TRACE_EVENTs is to reduce the impact of everything I explained
above. You only need to care with the strict necessary things for your traces:

- field name
- field type
- field formats

And that's pretty all. All the burden of copying in the ring buffer, filtering,
tracepoints, formats, output is done in background.

Also your tracer becomes non-ABI dependant because the formats of your fields
are dynamically described in dedicated debugfs files.
Tracer fields, even though we have workarounds to describe their format, have
much more contraints. Their format have a bit more constraints to be fixed.

Also a lot of things are developed in userspace that can profit to every TRACE_EVENTs
as Ingo has shown with perf. Steve's trace-cmd tool also handles them.

The ftrace tracers plugin are still used for non trivial cases where tracing
based on tracepoints are not sufficient. For example the function/function graph
tracers that require hot patching and a gcc feature plus a lot of background subtle
things, or the preemptoff/irqsoff/preemptirqsoff tracers that require a snapshot
of a maximum latency trace, etc...

That's why the ftrace tracers plugins still exist: to cover the non-trivial
cases. But using them for tracing based on simple static tracepoints like yours
is a pure legacy.

Frederic.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html