linux-kernel - Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable errors are marked as fatal.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180419154006.GE3600@pd.tnic>
Date:   Thu, 19 Apr 2018 17:40:06 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     "Alex G." <mr.nuke.me@...il.com>
Cc:     linux-acpi@...r.kernel.org, linux-edac@...r.kernel.org,
        rjw@...ysocki.net, lenb@...nel.org, tony.luck@...el.com,
        tbaicar@...eaurora.org, will.deacon@....com, james.morse@....com,
        shiju.jose@...wei.com, zjzhang@...eaurora.org,
        gengdongjiu@...wei.com, linux-kernel@...r.kernel.org,
        alex_gagniuc@...lteam.com, austin_bolen@...l.com,
        shyam_iyer@...l.com, devel@...ica.org, mchehab@...nel.org,
        robert.moore@...el.com, erik.schmauss@...el.com
Subject: Re: [RFC PATCH v2 3/4] acpi: apei: Do not panic() when correctable
 errors are marked as fatal.

On Thu, Apr 19, 2018 at 09:57:07AM -0500, Alex G. wrote:
> ghes_severity() is a one-to-one mapping from a set of unsorted
> severities to monotonically increasing numbers. The "one-to-one" mapping
> part of the sentence is obvious from the function name. To change it to
> parse the entire GHES would completely destroy this, and I think it
> would apply policy in the wrong place.

So do a wrapper or whatever. Do a ghes_compute_severity() or however you
would wanna call it and do the iteration there.

> Should I do that, I might have to call it something like
> ghes_parse_and_apply_policy_to_severity(). But that misses the whole
> point if these changes.

What policy? You simply compute the severity like we do in the mce code.

> I would like to get to the handlers first, and then decide if things are
> okay or not,

Why? Give me an example why you'd handle an error first and then decide
whether we're ok or not?

Usually, the error handler decides that in one place. So what exactly
are you trying to do differently that doesn't fit that flow?

> I don't want to leave people scratching their heads, but I also don't
> want to make AER a special case without having a generic way to handle
> these cases. People are just as susceptible to scratch their heads
> wondering why AER is a special case and everything else crashes.

Not if it is properly done *and* documented why we applying the
respective policy for the error type.

> Maybe it's better move the AER handling to NMI/IRQ context, since
> ghes_handle_aer() is only scheduling the real AER andler, and is irq
> safe. I'm scratching my head about why we're messing with IRQ work from
> NMI context, instead of just scheduling a regular handler to take care
> of things.

No, first pls explain what exactly you're trying to do and then we can
talk about how to do it. Btw, a real-life example to accompany that
intention goes a long way.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.