lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 2 Nov 2022 15:07:20 +0800
From:   Shuai Xue <xueshuai@...ux.alibaba.com>
To:     "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     lenb@...nel.org, james.morse@....com, tony.luck@...el.com,
        bp@...en8.de, dave.hansen@...ux.intel.com, jarkko@...nel.org,
        naoya.horiguchi@....com, linmiaohe@...wei.com,
        akpm@...ux-foundation.org, stable@...r.kernel.org,
        linux-acpi@...r.kernel.org, linux-kernel@...r.kernel.org,
        cuibixuan@...ux.alibaba.com, baolin.wang@...ux.alibaba.com,
        zhuo.song@...ux.alibaba.com
Subject: Re: [PATCH] ACPI: APEI: set memory failure flags as
 MF_ACTION_REQUIRED on action required events



在 2022/10/29 AM1:08, Rafael J. Wysocki 写道:
> On Thu, Oct 27, 2022 at 6:25 AM Shuai Xue <xueshuai@...ux.alibaba.com> wrote:
>>
>> There are two major types of uncorrected error (UC) :
>>
>> - Action Required: The error is detected and the processor already consumes the
>>   memory. OS requires to take action (for example, offline failure page/kill
>>   failure thread) to recover this uncorrectable error.
>>
>> - Action Optional: The error is detected out of processor execution context.
>>   Some data in the memory are corrupted. But the data have not been consumed.
>>   OS is optional to take action to recover this uncorrectable error.
>>
>> For X86 platforms, we can easily distinguish between these two types
>> based on the MCA Bank. While for arm64 platform, the memory failure
>> flags for all UCs which severity are GHES_SEV_RECOVERABLE are set as 0,
>> a.k.a, Action Optional now.
>>
>> If UC is detected by a background scrubber, it is obviously an Action
>> Optional error.  For other errors, we should conservatively regard them
>> as Action Required.
>>
>> cper_sec_mem_err::error_type identifies the type of error that occurred
>> if CPER_MEM_VALID_ERROR_TYPE is set. So, set memory failure flags as 0
>> for Scrub Uncorrected Error (type 14). Otherwise, set memory failure
>> flags as MF_ACTION_REQUIRED.
>>
>> Signed-off-by: Shuai Xue <xueshuai@...ux.alibaba.com>
> 
> I need input from the APEI reviewers on this.
> 
> Thanks!

Hi, Rafael,

Sorry, I missed this email. Thank you for you quick reply. Let's discuss with
reviewers.

Thank you.

Cheers,
Shuai


> 
>> ---
>>  drivers/acpi/apei/ghes.c | 10 ++++++++--
>>  include/linux/cper.h     |  3 +++
>>  2 files changed, 11 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 80ad530583c9..6c03059cbfc6 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -474,8 +474,14 @@ static bool ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata,
>>         if (sec_sev == GHES_SEV_CORRECTED &&
>>             (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED))
>>                 flags = MF_SOFT_OFFLINE;
>> -       if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
>> -               flags = 0;
>> +       if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE) {
>> +               if (mem_err->validation_bits & CPER_MEM_VALID_ERROR_TYPE)
>> +                       flags = mem_err->error_type == CPER_MEM_SCRUB_UC ?
>> +                                       0 :
>> +                                       MF_ACTION_REQUIRED;
>> +               else
>> +                       flags = MF_ACTION_REQUIRED;
>> +       }
>>
>>         if (flags != -1)
>>                 return ghes_do_memory_failure(mem_err->physical_addr, flags);
>> diff --git a/include/linux/cper.h b/include/linux/cper.h
>> index eacb7dd7b3af..b77ab7636614 100644
>> --- a/include/linux/cper.h
>> +++ b/include/linux/cper.h
>> @@ -235,6 +235,9 @@ enum {
>>  #define CPER_MEM_VALID_BANK_ADDRESS            0x100000
>>  #define CPER_MEM_VALID_CHIP_ID                 0x200000
>>
>> +#define CPER_MEM_SCRUB_CE                      13
>> +#define CPER_MEM_SCRUB_UC                      14
>> +
>>  #define CPER_MEM_EXT_ROW_MASK                  0x3
>>  #define CPER_MEM_EXT_ROW_SHIFT                 16
>>
>> --
>> 2.20.1.9.gb50a0d7
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ