lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sun, 15 Aug 2010 09:35:13 -0400 From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com> To: Steven Rostedt <rostedt@...dmis.org> Cc: Peter Zijlstra <peterz@...radead.org>, Linus Torvalds <torvalds@...ux-foundation.org>, Frederic Weisbecker <fweisbec@...il.com>, Ingo Molnar <mingo@...e.hu>, LKML <linux-kernel@...r.kernel.org>, Andrew Morton <akpm@...ux-foundation.org>, Thomas Gleixner <tglx@...utronix.de>, Christoph Hellwig <hch@....de>, Li Zefan <lizf@...fujitsu.com>, Lai Jiangshan <laijs@...fujitsu.com>, Johannes Berg <johannes.berg@...el.com>, Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>, Arnaldo Carvalho de Melo <acme@...radead.org>, Tom Zanussi <tzanussi@...il.com>, KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>, Andi Kleen <andi@...stfloor.org>, "H. Peter Anvin" <hpa@...or.com>, Jeremy Fitzhardinge <jeremy@...p.org>, "Frank Ch. Eigler" <fche@...hat.com>, Tejun Heo <htejun@...il.com> Subject: Re: [patch 1/2] x86_64 page fault NMI-safe * Steven Rostedt (rostedt@...dmis.org) wrote: > Egad! Go on vacation and the world falls apart. > > On Wed, 2010-08-04 at 08:27 +0200, Peter Zijlstra wrote: > > On Tue, 2010-08-03 at 11:56 -0700, Linus Torvalds wrote: > > > On Tue, Aug 3, 2010 at 10:18 AM, Peter Zijlstra <peterz@...radead.org> wrote: > > > > > > > > FWIW I really utterly detest the whole concept of sub-buffers. > > > > > > I'm not quite sure why. Is it something fundamental, or just an > > > implementation issue? > > > > The sub-buffer thing that both ftrace and lttng have is creating a large > > buffer from a lot of small buffers, I simply don't see the point of > > doing that. It adds complexity and limitations for very little gain. > > So, I want to allocate a 10Meg buffer. I need to make sure the kernel > has 10megs of memory available. If the memory is quite fragmented, then > too bad, I lose out. > > Oh wait, I could also use vmalloc. But then again, now I'm blasting > valuable TLB entries for a tracing utility, thus making the tracer have > a even bigger impact on the entire system. > > BAH! > > I originally wanted to go with the continuous buffer, but I was > convinced after trying to implement it, that it was a bad choice. > Specifically, because of needing to 1) get large amounts of memory that > is continuous, or 2) eating up TLB entries and causing the system to > perform poorer. > > I chose page size "sub-buffers" to solve the above. It also made > implementing splice trivial. OK, I admit, I never thought about mmapping > the buffers, just because I figured splice was faster. But I do have > patches that allow a user to mmap the entire ring buffer, but only in a > "producer/consumer" mode. FYI: the generic ring buffer also implements the mmap() interface for the flight recorder mode. > > Note, I use page size sub-buffers, but the design could work with any > size sub-buffers. I just never implemented that (even though, when I > wrote the code it was secretly on my todo list). The main difference between our designs is that Ftrace use a linked list and the generic ring buffer lib. uses a sub-buffer/page table. Considering the use-case of reading available flight recorder pages in reverse order I've hear about at LinuxCon (heard about it from both from Peter and Masami, and it actually makes a whole lot of sense, because the data we care about the most and want to read ASAP is the last subbuffers), I think the page table is more appropriate (and flexible) than a single-direction linked list, because it allows to pick a random page (or subbuffer) in the buffer without walking over all pages. > > > > > > Their benefit is known synchronization points into the stream, you can > > parse each sub-buffer independently, but you can always break up a > > continuous stream into smaller parts or use a transport that includes > > index points or whatever. > > > > Their down side is that you can never have individual events larger than > > the sub-buffer, you need to be aware of the sub-buffer when reserving > > space etc.. > > The answer to that is to make a macro to do the assignment of the event, > and add a new API. > > event = ring_buffer_reserve_unlimited(); > > ring_buffer_assign(event, data1); > ring_buffer_assign(event, data2); > > ring_buffer_commit(event); > > The ring_buffer_reserve_unlimited() could reserve a bunch of space > beyond one ring buffer. It could reserve data in fragments. Then the > ring_buffer_assgin() could either copy directly to the event (if the > event exists on one sub buffer) or do a copy the space was fragmented. > > Of course, userspace would need to know how to read it. And it can get > complex due to interrupts coming in and also reserving between > fragments, or what happens if a partial fragment is overwritten. But all > these are not impossible to solve. Dealing with fragmentation, sub-buffer loss, etc. is then pushed up to the trace analyzer. While I agree that we have to keep the burden of complexity out of the kernel as much as possible, I also think that an elegant design at the data producer level which keeps the trace reader/analyzer simple and reliable should be favored if it keeps a similar level of complexity in the kernel code. A good argument supporting this is that some tracing users want to use a mmap() scheme directly on the trace buffers to analyze the data directly in user-space on the traced machine. In these cases, the complexity/overhead added to the analyzer will impact the overall performance of the system being traced. Thanks, Mathieu > > -- Steve > > > -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists