linux-kernel - Re: [RFC/Requirements/Design] h/w error reporting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101110101450.GA18481@elte.hu>
Date:	Wed, 10 Nov 2010 11:14:50 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	"Luck, Tony" <tony.luck@...el.com>
Cc:	linux-kernel@...r.kernel.org, ying.huang@...el.com, bp@...en8.de,
	tglx@...utronix.de, akpm@...ux-foundation.org, mchehab@...hat.com,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Arjan van de Ven <arjan@...radead.org>
Subject: Re: [RFC/Requirements/Design] h/w error reporting

* Luck, Tony <tony.luck@...el.com> wrote:

> Taking a cue from the tracing session from the previous day (where the "perf" vs. 
> "ftrace" vs. "lttng" war was ended by proposing a new tracing methodology that 
> would overcome the shortcomings of both of the merged subsystems while also 
> addressing the requirements of the lttng users) [...]

Well, the direction is that we are unifying ftrace and perf events and we are 
actively phasing out individual ftrace plugins as matching events become available 
(we already removed a few).

Most new tools use the perf syscall and tool writers have expressed the very 
understandable desire that all events (and their reporting facility) be enumerated 
and accessible via a unified API/ABI.

While it often seems easier for subsystems to just do their own ad-hoc 
logging/reporting in the short run (every subsystem tends to think it has its own 
very specific requirements for logging - while users/tool-authors can only shake 
their head in disbelief when looking at the myriads of incompatible and inconsistent 
facilities). The tooling requirement for unification is strong here and can not be 
ignored.

> [...] we explored whether the solution would be to define a new "system health" 
> subsystem that could be used by any part of the kernel to report hardware issues 
> in a coherent way so that end users would have a single place to look for all 
> error information.

Note that Boris has been working on extending perf events into this area as well, 
see this recent submission of patches on lkml:

  [PATCH 20/20] ras: Add RAS daemon

One thing is clear: any 'health subsystem' should not do its own flavor of error 
reporting - instead we want to unify various forms of event logging into a common 
facility.

RAS/EDAC could do its own hardware-specific settings via a separate subsystem - 
although even many of those can be expressed via their respective events. (and we 
are open on the perf events side to give callbacks/facilities for such use)

The synergies of unified event reporting are very strong.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/