[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180522183336.GA4177@agluck-desk>
Date: Tue, 22 May 2018 11:33:36 -0700
From: "Luck, Tony" <tony.luck@...el.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Borislav Petkov <bp@...en8.de>, "Alex G." <mr.nuke.me@...il.com>,
alex_gagniuc@...lteam.com, austin_bolen@...l.com,
shyam_iyer@...l.com, "Rafael J. Wysocki" <rjw@...ysocki.net>,
Len Brown <lenb@...nel.org>,
Tyler Baicar <tbaicar@...eaurora.org>,
Will Deacon <will.deacon@....com>,
James Morse <james.morse@....com>,
Shiju Jose <shiju.jose@...wei.com>,
"Jonathan (Zhixiong) Zhang" <zjzhang@...eaurora.org>,
Dongjiu Geng <gengdongjiu@...wei.com>,
ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v6 1/2] acpi: apei: Rename ghes_severity() to
ghes_cper_severity()
On Tue, May 22, 2018 at 08:10:47PM +0200, Rafael J. Wysocki wrote:
> > PCIe fatal means that the link or the device is broken.
>
> And that may really mean that the component in question is on fire.
> We just don't know.
Components on fire could be the root cause of many errors. If we really
believe that is a problem we should power the system off rather than
just calling panic() [not just for PCIe errors, but also for machine
checks, and perhaps a bunch of other places in the kernel].
True story: I used to work for Stratus Computer on fault tolerant
systems. A customer once called in with a "my computer is on fire"
report and asked what to do. The support person told them to power it
off. Customer asked "Isn't there something else? It's still running
just fine".
-Tony
Powered by blists - more mailing lists