linux-kernel - Re: [patch 1/2] x86_64 page fault NMI-safe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1281537273.3058.14.camel@gandalf.stny.rr.com>
Date:	Wed, 11 Aug 2010 10:34:33 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...e.hu>,
	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Christoph Hellwig <hch@....de>, Li Zefan <lizf@...fujitsu.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Johannes Berg <johannes.berg@...el.com>,
	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Tom Zanussi <tzanussi@...il.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Andi Kleen <andi@...stfloor.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	"Frank Ch. Eigler" <fche@...hat.com>, Tejun Heo <htejun@...il.com>
Subject: Re: [patch 1/2] x86_64 page fault NMI-safe

Egad! Go on vacation and the world falls apart.

On Wed, 2010-08-04 at 08:27 +0200, Peter Zijlstra wrote:
> On Tue, 2010-08-03 at 11:56 -0700, Linus Torvalds wrote:
> > On Tue, Aug 3, 2010 at 10:18 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > FWIW I really utterly detest the whole concept of sub-buffers.
> > 
> > I'm not quite sure why. Is it something fundamental, or just an
> > implementation issue?
> 
> The sub-buffer thing that both ftrace and lttng have is creating a large
> buffer from a lot of small buffers, I simply don't see the point of
> doing that. It adds complexity and limitations for very little gain.

So, I want to allocate a 10Meg buffer. I need to make sure the kernel
has 10megs of memory available. If the memory is quite fragmented, then
too bad, I lose out.

Oh wait, I could also use vmalloc. But then again, now I'm blasting
valuable TLB entries for a tracing utility, thus making the tracer have
a even bigger impact on the entire system.

BAH!

I originally wanted to go with the continuous buffer, but I was
convinced after trying to implement it, that it was a bad choice.
Specifically, because of needing to 1) get large amounts of memory that
is continuous, or 2) eating up TLB entries and causing the system to
perform poorer.

I chose page size "sub-buffers" to solve the above. It also made
implementing splice trivial. OK, I admit, I never thought about mmapping
the buffers, just because I figured splice was faster. But I do have
patches that allow a user to mmap the entire ring buffer, but only in a
"producer/consumer" mode.

Note, I use page size sub-buffers, but the design could work with any
size sub-buffers. I just never implemented that (even though, when I
wrote the code it was secretly on my todo list).

> 
> Their benefit is known synchronization points into the stream, you can
> parse each sub-buffer independently, but you can always break up a
> continuous stream into smaller parts or use a transport that includes
> index points or whatever.
> 
> Their down side is that you can never have individual events larger than
> the sub-buffer, you need to be aware of the sub-buffer when reserving
> space etc..

The answer to that is to make a macro to do the assignment of the event,
and add a new API.

	event = ring_buffer_reserve_unlimited();

	ring_buffer_assign(event, data1);
	ring_buffer_assign(event, data2);

	ring_buffer_commit(event);

The ring_buffer_reserve_unlimited() could reserve a bunch of space
beyond one ring buffer. It could reserve data in fragments. Then the
ring_buffer_assgin() could either copy directly to the event (if the
event exists on one sub buffer) or do a copy the space was fragmented.

Of course, userspace would need to know how to read it. And it can get
complex due to interrupts coming in and also reserving between
fragments, or what happens if a partial fragment is overwritten. But all
these are not impossible to solve.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/