lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 24 Sep 2008 15:28:40 -0700 (PDT)
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Mathieu Desnoyers <compudj@...stal.dyndns.org>
cc:	Martin Bligh <mbligh@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	prasad@...ux.vnet.ibm.com, "Frank Ch. Eigler" <fche@...hat.com>,
	David Wilder <dwilder@...ibm.com>, hch@....de,
	Tom Zanussi <zanussi@...cast.net>,
	Steven Rostedt <srostedt@...hat.com>
Subject: Re: [RFC PATCH 1/3] Unified trace buffer



On Wed, 24 Sep 2008, Mathieu Desnoyers wrote:
> 
> The reason why Martin did use only a 27 bits TSC in ktrace was that they
> were statically limited to 32 event types.

Well, I actually think we could do the same - for the "internal" types.

So why not do something like 4-5 bits for the basic type information, and 
then oen of those cases is a "freeform" thing, and the others are reserved 
for other uses.

So a trace entry header could easily look something like

	struct trace_entry {
		u32 tsc_delta:27,
		     type:5;
		u32 data;
		u64 array[];
	}

and then depending on the that 5-bit type, the "data" field in the header 
means different things, and the size of the trace_entry also is different.

So it could be something like

 - case 0: EnfOfPage marker
	(data is ignored)
	size = 8

 - case 1: TSCExtend marker
	data = extended TSC (bits 28..59)
	size = 8

 - case 2: TimeStamp marker
	data = tv_nsec
	array[0] = tv_sec
	size = 16

 - case 3: LargeBinaryBlob marker
	data = 32-bit length of binary data
	array[0] = 64-bit pointer to binary blob
	array[1] = 64-bit pointer to "free" function
	size = 24

 - case 4: SmallBinaryBlob marker
	data = inline length in bytes, must be < 4096
	array[0..(len+7)/8] = inline data, padded
	size = (len+15) & ~7

 - case 5: AsciiFormat marker
	data = number of arguments
	array[0] = 64-bit pointer to static const format string
	array[1..arg] = argument values
	size = 8*(2+arg)

  ...

ie we use a few bits for "trace _internal_ type fields", and then for a 
few of those types we have internal meanings, and other types just means 
that the user can fill in the data itself.

IOW, you _could_ have an interface like

	ascii_marker_2(ringbuffer,
		"Reading sector %lu-%lu",
		sector, sector+nsec);

and what it would create would be a fairly small trace packet that looks 
something like

	.type = 5,
	.tsc_delta = ...,
	.data = 2,
	.array[0] = (const char *) "Reading sector %lu-%lu\n"
	.array[1] = xx,
	.array[2] = yy

and you would not actually print it out as ASCII until somebody read it 
from the kernel (and any "binary" interface would get the string as a 
string, not as a pointer, because the pointer is obviously meaningless 
outside the kernel.

Also note how you'd literally just have a single copy of the string, 
because the rule would be that a trace user must use a static string, not 
some generated one that can go away (module unloading would need to be 
aware of any trace buffer entries, of course - perhaps by just disallowing 
unloading while trace buffers are active).

And note! Everything above is meant as an example of something that 
_could_ work. I do like the notion of putting pointers to strings in the 
markers, rather than having some odd magic numerical meaning that user 
space has to just magically know that "event type 56 for ring buffer type 
171 means that there are two words that mean 'sector' and 'end-sector' 
respectively".

But it's still meant more as an RFC. But I think it could work.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ