linux-kernel - Re: [patch 1/2] x86_64 page fault NMI-safe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 9 Aug 2010 18:53:10 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>,
	LKML <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Steven Rostedt <rostedt@...tedt.homelinux.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Christoph Hellwig <hch@....de>, Li Zefan <lizf@...fujitsu.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Johannes Berg <johannes.berg@...el.com>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Tom Zanussi <tzanussi@...il.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Andi Kleen <andi@...stfloor.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Jeremy Fitzhardinge <jeremy@...p.org>,
	"Frank Ch. Eigler" <fche@...hat.com>, Tejun Heo <htejun@...il.com>,
	2nddept-manager@....hitachi.co.jp
Subject: Re: [patch 1/2] x86_64 page fault NMI-safe

On Fri, Aug 06, 2010 at 11:50:40AM +0200, Peter Zijlstra wrote:
> On Fri, 2010-08-06 at 15:18 +0900, Masami Hiramatsu wrote:
> > Peter Zijlstra wrote:
> > > On Wed, 2010-08-04 at 10:45 -0400, Mathieu Desnoyers wrote:
> > > 
> > >> How do you plan to read the data concurrently with the writer overwriting the
> > >> data while you are reading it without corruption ?
> > > 
> > > I don't consider reading while writing (in overwrite mode) a valid case.
> > > 
> > > If you want to use overwrite, stop the writer before reading it.
> > 
> > For example, would you like to read system audit log always after
> > stop the audit?
> > 
> > NO, that's a most important requirement for tracers, especially for
> > system admins (they're the most important users of Linux) to check
> > the system health and catch system troubles.
> > 
> > For performance measurement and checking hotspot, one-shot tracing
> > is enough. But it's just for developers. But for the real world
> > computing, Linux is just an OS, users want to run their system,
> > middleware and applications, without troubles. But when they hit
> > a trouble, they wanna shoot it ASAP.
> > The flight recorder mode is mainly for those users.
> 
> You cannot over-write and consistently read the buffer, that's plain
> impossible. With sub-buffers you can swivel a sub-buffer and
> consistently read that, but there is no guarantee the next sub-buffer
> you steal was indeed adjacent to the previous buffer you stole as that
> might have gotten over-written by the active writer while you were
> stealing the previous one.
> 
> If you want to snapshot buffers, do that, simply swivel the whole trace
> buffer, and continue tracing in a new one, then consume the old trace in
> a consistent manner.
> 
> I really see no value in being able to read unrelated bits and pieces of
> a buffer.

It all depends on the frequency on your events and on the amount of memory
used for the buffer.

If you are tracing syscalls in a semi-idle box with a ring buffer of 500 MB
per cpu, you really don't care about the writer catching up the reader: it
will simply not happen.

OTOH if you are tracing function graphs, no buffer size will ever be enough:
the writer will always be faster and catch up the reader.

Using the sub-buffer scheme though, and allowing concurrent writer and reader
in overwriting mode, we can easily tell the user about the writer beeing
faster and content that have been lost. On top of these informations, the
user can chose what to do: trying with a larger buffer or so.

See? It's not our role to say: the result might be unreliable if the user
does silly settings (not enough memory, reader too slow for random reasons,
too high frequency events or so...). Let the user deal with that and just
inform him about unreliable results. This is what ftrace does currently.

Also the snapshot thing doesn't look like a replacement. If you are
tracing on a low memory embedded system, you consume a lot of memory
to keep the snapshot alive, it means the live buffer can be critically
lowered and you might in turn lose traces there.
That said it's an interesting feature that may fit on other kind of
environments or for other needs.

Off-topic: It's sad that about tracing, we often have to figure out the needs
from embedded world, or learn from indirect sources. In the end we rarely
know from them directly. Except may be in confs....

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/