linux-kernel - Re: [PATCH 1/1] RAS: Add CPU Correctable Error Collector to isolate an erroneous CPU core

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200901143539.GC8392@zn.tnic>
Date:   Tue, 1 Sep 2020 16:35:39 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     Shiju Jose <shiju.jose@...wei.com>
Cc:     linux-edac@...r.kernel.org, linux-acpi@...r.kernel.org,
        linux-kernel@...r.kernel.org, tony.luck@...el.com,
        rjw@...ysocki.net, james.morse@....com, lenb@...nel.org,
        linuxarm@...wei.com
Subject: Re: [PATCH 1/1] RAS: Add CPU Correctable Error Collector to isolate
 an erroneous CPU core

On Tue, Sep 01, 2020 at 03:01:40PM +0100, Shiju Jose wrote:
> When the CPU correctable errors reported on an ARM64 CPU core too often,
> it should be isolated. Add the CPU correctable error collector to
> store the CPU correctable error count.
> 
> When the correctable error count for a CPU exceed the threshold
> value in a short time period, it will try to isolate the CPU core.
> The threshold value, time period etc are configurable.
> 
> Implementation details is added in the file.
> 
> Signed-off-by: Shiju Jose <shiju.jose@...wei.com>
> ---
>  Documentation/ABI/testing/debugfs-cpu-cec |  22 ++
>  arch/arm64/ras/Kconfig                    |   8 +
>  drivers/acpi/apei/ghes.c                  |  30 +-
>  drivers/ras/Kconfig                       |   1 +
>  drivers/ras/Makefile                      |   1 +
>  drivers/ras/cpu_cec.c                     | 393 ++++++++++++++++++++++

So instead of adding the ability to collect other error types to the
CEC, you're duplicating the CEC itself?!

Why?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette