[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+55aFwXTME_Ge53xKjyM35mVQ+gmvDPwTLa5kR5zAMnNTFSbQ@mail.gmail.com>
Date: Thu, 22 May 2014 08:30:33 +0900
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: Andy Lutomirski <luto@...capital.net>,
Borislav Petkov <bp@...en8.de>, Jiri Kosina <jkosina@...e.cz>,
Thomas Gleixner <tglx@...utronix.de>,
Steven Rostedt <rostedt@...dmis.org>,
Andi Kleen <andi@...stfloor.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>
Subject: Re: [RFC] x86_64: A real proposal for iret-less return to kernel
On Thu, May 22, 2014 at 8:19 AM, Luck, Tony <tony.luck@...el.com> wrote:
>
> Yes. Bystander broadcast machine checks can and will hit processors
> that are in NMI context ... and we must not make that fatal.
.. and this, btw, is just another example of why MCE hardware
designers are f*cking morons that should be given extensive education
about birth control and how not to procreate.
MCE is frankly misdesigned. It's a piece of shit, and any of the
hardware designers that claim that what they do is for system
stability are out to lunch. This is a prime example of what *NOT* to
do, and how you can actually spread what was potentially a localized
and recoverable error, and make it global and unrecoverable.
Can we please get these designers either fired, or re-educated?
Because this shit has been going on too long. I complained about this
to Tony many years ago, and nothing was ever fixed.
Synchronous MCE's are fine for synchronous errors, but then trying to
turn them "synchronous" for other CPU's (where they *weren't*
synchronous errors) is a major mistake. External errors punching
through irq context is wrong, punching through NMI is just
inexcusable.
If the OS then decides to take down the whole machine, the OS - not
the hardware - can choose to do something that will punch through
other CPU's NMI blocking (notably, init/reset), but the hardware doing
this on its own is just broken if true.
Anyway, I repeat: I refuse to fix hardware bugs. As far as we are
concerned, this is "best effort", and the hardware designers should
take a long deep look at their idiotic schemes. If something punches
through NMI, it's deadly. It's that simple.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists