lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101026101536.GC16552@elte.hu>
Date:	Tue, 26 Oct 2010 12:15:36 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Huang Ying <ying.huang@...el.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>, Len Brown <lenb@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Andi Kleen <andi@...stfloor.org>,
	"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
	Borislav Petkov <petkovbb@...glemail.com>,
	"H. Peter Anvin" <hpa@...or.com>, Don Zickus <dzickus@...hat.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mauro Carvalho Chehab <mchehab@...hat.com>,
	"Luck, Tony" <tony.luck@...el.com>
Subject: Re: [NAK] Re: [PATCH -v2 9/9] ACPI, APEI, Generic Hardware Error
 Source POLL/IRQ/NMI notification type support


* Huang Ying <ying.huang@...el.com> wrote:

> Hi, Thomas,
> 
> On Tue, 2010-10-26 at 12:53 +0800, Thomas Gleixner wrote:
> > B1;2401;0cLen,
> > 
> > On Mon, 25 Oct 2010, Len Brown wrote:
> > 
> > > >  NAKed-by: Ingo Molnar <mingo@...e.hu>
> > > 
> > > Everybody knows that Linux has a lot to learn about RAS.
> > > 
> > > I think to catch up, we need to play to Linux's strengths
> > > of continuous improvement.  If we halt patches in this area
> > > then we could wait forever for the "perfect design".
> > 
> > it's not about perfect design. It's about creating new user space
> > ABIs. The patches introduce another error reporting user space ABI
> > with an ad hoc "fits the needs" design.
> > 
> > This is my major point of objection. 
> > 
> > I agree that Linux needs improvement on the RAS side, but does this
> > lack of features justify a new user space ABI which is totally
> > disconnected to existing RAS facilities ?
> > 
> > No, it does not. It's not our problem that Intel wasted time on
> > creating another character device driver to report errors to user
> > space. The time spent to do so would have been sufficient to do a
> > proper integration into the existing infrastructure.
> > 
> > I would not care at all if these patches would just introduce some
> > weird in kernel interfaces as we can clean that up at will. But
> > introducing a new user space ABI is setting the disconnect of RAS
> > related facilities into stone.
> > 
> > From Kconfig:
> > 
> >   EDAC is designed to report errors in the core system.
> >   These are low-level errors that are reported in the CPU or
> >   supporting chipset or other subsystems:
> >   memory errors, cache errors, PCI errors, thermal throttling, etc..
> >   If unsure, select 'Y'.
> > 
> > So please explain why your error reporting is so different from the
> > above that it justifies a separate facility. And you better come up
> > with a real good explanation other than we looked at EDAC and it did
> > not fit our needs.
> 
> As far as I know, EDAC guys plan to use some other "perfect interface" in the 
> future. So I think the current state is really waiting for the "perfect design".

Not sure what you mean by this, but Boris has posted links to his latest patch-set 
in this thread, see:

  http://kerneltrap.org/mailarchive/linux-kernel/2010/8/6/4603847

The Git coordinates are:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace.git, branch tip/perf/parse-events

The 'persistent events' facility he has prototyped there appears to be a good 
potential match for the ERST store.

It would be very useful to have another feature there: to mark persistent events as 
'dump into syslog on bootup', so that for example the contents of the ERST log could 
be dumped right on bootup. [but ERST would not be the only persistent event that 
could be marked like that.]

Note that we dont need/want other ABI accesses to the ERST log (i.e. we dont want 
/dev/erst-dbg), because we want the benefits of the generalization: tooling (RAS and 
other tooling) should learn how to deal with persistent events - not learn how to 
deal with ERST logs ... or with warm bootup RAM-embedded logs ... or to deal with 
kcrash embedded kernel logs ... etc.

There are many obvious advantages from implementing it like that: there's no need to 
special-code ERST to printk or ERST to whatever other facility cross links - it 
would be part of a generic/uniform event logging facility to begin with. ERST would 
only implement its own, narrow, hardware-specific event accessor methods - nothing 
else. Basically a small 'event driver'. This would be the most optimal, smallest, 
easiest to maintain approach - with no facility duplication and no fragmentation.

It's certainly more work as well _for the first such example_ - but from that point 
on any new hardware facility can be added with ease, and those too will fit into 
existing tooling in a very natural way.

So please help out with the persistent events work. If you need any pointers we'd be 
glad to help.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ