lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200111131744.GC23583@zn.tnic>
Date:   Sat, 11 Jan 2020 14:17:44 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     Jan H. Schönherr <jschoenh@...zon.de>,
        Yazen Ghannam <yazen.ghannam@....com>,
        linux-kernel@...r.kernel.org, linux-edac@...r.kernel.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, x86@...nel.org
Subject: Re: [PATCH v2 1/6] x86/mce: Take action on UCNA/Deferred errors again

On Fri, Jan 10, 2020 at 10:45:33AM -0800, Luck, Tony wrote:
> I totally agree that counting notifiers is clumsy. Also less than
> ideal is the concept that any notifier on the chain can declare:
>   "I fixed it"
> and prevent any other notifiers from even seeing it. Well the concept
> is good, but it is overused.

But why can't we use it?

Don't get me wrong: I'm simply following my KISS approach to do the
simplest scheme required. So, do you see a use case where the whole
error handling chain would need more sophisticated handling?

> I think we may do better with a field in the "struct mce" that is being
> passed to each where notifiers can wiggle some bits (semantics to be
> defined later) which can tell subsequent notifiers what sort of actions
> have been taken.
> E.g. the SRAO/UCNA notifier can say "I took this page offline"
> the dev_mcelog one can say "I think I handed to a process that has /dev/mcelog open"
> EDAC drivers can say "I decoded the address and printed something"
> CEC can say: "I silently counted this corrected error", or "error exceeded
> threshold and I took the page offline".
> 
> The default notifier can print to console if nobody set a bit to say
> that the error had been somehow logged.

That idea is good and I'll gladly take patches for it so if you wanna do
it...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ