linux-kernel - Re: [RFC PATCH 2/6] Driver Tracing Interface (DTI) code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20070705161438.GA23668@Krystal>
Date:	Thu, 5 Jul 2007 12:14:38 -0400
From:	Mathieu Desnoyers <compudj@...stal.dyndns.org>
To:	Tom Zanussi <zanussi@...ibm.com>
Cc:	linux-kernel@...r.kernel.org, dwilder@...ibm.com,
	HOLZHEU@...ibm.com
Subject: Re: [RFC PATCH 2/6] Driver Tracing Interface (DTI) code

* Tom Zanussi (zanussi@...ibm.com) wrote:
> On Tue, 2007-07-03 at 23:13 -0400, Mathieu Desnoyers wrote:
> > > +void *__dti_reserve(struct dti_info *dti, int prio, size_t len)
> > > +{
> > > +	struct dti_event *event;
> > > +	int event_len;
> > > +
> > > +	if (!dti)
> > > +		return NULL;
> > > +
> > > +	if (prio > dti->level)
> > > +		return NULL;
> > > +
> > > +	event_len = len + sizeof(*event);
> > > +
> > > +	event = relay_reserve(dti->trace->rchan, event_len);
> > > +	if (!event)
> > > +		return NULL;
> > > +
> > > +	event->time = sched_clock();
> > 
> > sched_clock() is dangerous.. you have to make sure you provide correct
> > clock errors in the trace parsing.. not exactly something interesting
> > when you are looking at the "one" case over 10000 that was faulty. See
> > the i386 sched-clock.c:
> > 
> >  * Scheduler clock - returns current time in nanosec units.
> >  * All data is local to the CPU.
> >  * The values are approximately[1] monotonic local to a CPU, but not
> >  * between CPUs.   There might be also an occasionally random error,
> >  * but not too bad. Between CPUs the values can be non monotonic.
> >  *
> >  * [1] no attempt to stop CPU instruction reordering, which can hit
> >  * in a 100 instruction window or so.
> > 
> > How do you plan to use these timestamps to reorder events across CPUs ?
> > 
> 
> lttng_timestamp()? ;-)
> 

Yeah, this problem is not easy.. I have a set of per architecture
timestamping implementations that detects the limitations of the
hardware and overcomes them (32 bits TSC -> 64 bits extension that
supports read from NMI context, detect TSCs not in sync (AMDs and some
intels) that falls back to what is more or less a logical clock then,
printing warnings. I also keep a recipe cookbook about what people
should disable on these architectures to get precise timestamps.
> 
> > > +
> > > +static int vprintk_normal(struct dti_info *dti, int prio, const char* fmt,
> > > +			  va_list args)
> > > +{
> > > +	struct dti_event *event;
> > > +	void *buf;
> > > +	int len, event_len, rc = -1;
> > > +	unsigned long flags;
> > > +
> > > +	if (!dti)
> > > +		return -1;
> > > +
> > > +	if (prio > dti->level)
> > > +		return -1;
> > > +
> > > +	local_irq_save(flags);
> > > +	buf = dti_printf_tmpbuf[smp_processor_id()];
> > > +	len = vsnprintf(buf, DTI_PRINTF_TMPBUF_SIZE, fmt, args);
> > 
> > As I said, why not put the result of the vsnprintf directly in the
> > buffer after the relay_reserve ?
> > 
> 
> Because I don't know how much to reserve until I've done the
> vsnprintf().  I think that to have vsnprintf() print directly into the
> relay buffer, I'd need a relay_unreserve() to remove the unused portion.
> 

This is why I use a 2 pass approach in LTTng: the first one, given a
NULL buffer, calculates the reserve size, and then the second pass does
the copy.

> > > +
> > > +/**
> > > + *	dti_relog_initbuf - re-log static initbuf into real relay channel
> > > + *	@work: work struct that contains the the dti handle
> > > + *
> > > + *	The global initbuf may contain events from multiple cpus.  These
> > > + *	events must be put into their respective cpu buffers once the
> > > + *	per-cpu channel is available.
> > > + *
> > > + *	Internal - exported for setup macros.
> > > + */
> > > +int dti_relog_initbuf(struct dti_handle *handle)
> > > +{
> > 
> > What do you do with the data recorded by the system while you do this
> > buffer copy? Is the timestamp ordering kept? Why do you copy the static
> > buffers into other buffers at all anyway ? They could be exported
> > through relay (if it supported exporting statically mapped buffers).
> > 
> 
> You're right, we need to do something about that.  As mentioned before,
> we copy the events from the static buffer into the relay channel (the
> timestamp ordering is kept) after it becomes available because that
> seems like the simplest thing to do.  We could add support to relay to
> export another set of (static) buffers, or add the relogging code to
> relay if other tracers want to use it.
> 
> 

The problem with relogging is that it will be a huge timestamping mess.
I doubt that we can reliably continue tracing while we do this copy.
Therefore, we will end up losing events or having timestamps jumping
backward.

A solution that would export the static buffers separately to user-space
and (maybe) allows them to be (somehow?) de-allocated once they have
been read by user-space seems cleaner to me. The switch between the
static buffers and the normal relay buffers could then be done
atomically in a simple pointer overwrite.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/