[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200901143539.GC8392@zn.tnic>
Date: Tue, 1 Sep 2020 16:35:39 +0200
From: Borislav Petkov <bp@...en8.de>
To: Shiju Jose <shiju.jose@...wei.com>
Cc: linux-edac@...r.kernel.org, linux-acpi@...r.kernel.org,
linux-kernel@...r.kernel.org, tony.luck@...el.com,
rjw@...ysocki.net, james.morse@....com, lenb@...nel.org,
linuxarm@...wei.com
Subject: Re: [PATCH 1/1] RAS: Add CPU Correctable Error Collector to isolate
an erroneous CPU core
On Tue, Sep 01, 2020 at 03:01:40PM +0100, Shiju Jose wrote:
> When the CPU correctable errors reported on an ARM64 CPU core too often,
> it should be isolated. Add the CPU correctable error collector to
> store the CPU correctable error count.
>
> When the correctable error count for a CPU exceed the threshold
> value in a short time period, it will try to isolate the CPU core.
> The threshold value, time period etc are configurable.
>
> Implementation details is added in the file.
>
> Signed-off-by: Shiju Jose <shiju.jose@...wei.com>
> ---
> Documentation/ABI/testing/debugfs-cpu-cec | 22 ++
> arch/arm64/ras/Kconfig | 8 +
> drivers/acpi/apei/ghes.c | 30 +-
> drivers/ras/Kconfig | 1 +
> drivers/ras/Makefile | 1 +
> drivers/ras/cpu_cec.c | 393 ++++++++++++++++++++++
So instead of adding the ability to collect other error types to the
CEC, you're duplicating the CEC itself?!
Why?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists