linux-kernel - Re: [PATCH V3 06/10] acpi: apei: panic OS with fatal error status block

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <18205aac-02ae-bd45-2d2d-aa01cf845ae7@codeaurora.org>
Date:   Thu, 13 Oct 2016 17:34:08 -0600
From:   "Baicar, Tyler" <tbaicar@...eaurora.org>
To:     Suzuki K Poulose <Suzuki.Poulose@....com>,
        christoffer.dall@...aro.org, marc.zyngier@....com,
        pbonzini@...hat.com, rkrcmar@...hat.com, linux@...linux.org.uk,
        catalin.marinas@....com, will.deacon@....com, rjw@...ysocki.net,
        lenb@...nel.org, matt@...eblueprint.co.uk, robert.moore@...el.com,
        lv.zheng@...el.com, mark.rutland@....com, james.morse@....com,
        akpm@...ux-foundation.org, sandeepa.s.prabhu@...il.com,
        shijie.huang@....com, paul.gortmaker@...driver.com,
        tomasz.nowicki@...aro.org, fu.wei@...aro.org, rostedt@...dmis.org,
        bristot@...hat.com, linux-arm-kernel@...ts.infradead.org,
        kvmarm@...ts.cs.columbia.edu, Dkvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org,
        linux-efi@...r.kernel.org, devel@...ica.org
Cc:     "Jonathan (Zhixiong) Zhang" <zjzhang@...eaurora.org>
Subject: Re: [PATCH V3 06/10] acpi: apei: panic OS with fatal error status
 block

Hello Suzuki,

On 10/13/2016 7:00 AM, Suzuki K Poulose wrote:
> On 07/10/16 22:31, Tyler Baicar wrote:
>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@...eaurora.org>
>>
>> Even if an error status block's severity is fatal, the kernel does not
>> honor the severity level and panic.
>>
>> With the firmware first model, the platform could inform the OS about a
>> fatal hardware error through the non-NMI GHES notification type. The OS
>> should panic when a hardware error record is received with this
>> severity.
>>
>> Call panic() after CPER data in error status block is printed if
>> severity is fatal, before each error section is handled.
>>
>> Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@...eaurora.org>
>> ---
>>  drivers/acpi/apei/ghes.c | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index 28d5a09..36894c8 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -141,6 +141,8 @@ static unsigned long ghes_estatus_pool_size_request;
>>  static struct ghes_estatus_cache 
>> *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
>>  static atomic_t ghes_estatus_cache_alloced;
>>
>> +static int ghes_panic_timeout __read_mostly = 30;
>> +
>>  static int ghes_ioremap_init(void)
>>  {
>>      ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
>> @@ -715,6 +717,12 @@ static int ghes_proc(struct ghes *ghes)
>>          if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))
>>              ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>>      }
>> +    if (ghes_severity(ghes->estatus->error_severity) >= 
>> GHES_SEV_PANIC) {
>> +        if (panic_timeout == 0)
>> +            panic_timeout = ghes_panic_timeout;
>> +        panic("Fatal hardware error!");
>
> I think there is a chance that we might miss the o/p of 
> ghes_print_estatus() as we use
> no pfx, and it could default to the normal loglevel and would never 
> get printed
> if panic() is encountered before it. On the other hand, there is 
> already a
> __ghes_panic() which does similar stuff. Is there a way we could reuse
> (may be even parts of) it ? Or at least use KERN_EMERG for the 
> ghes_print_estatus(),
> if the severity could result in panic() ?
__ghes_panic() does additional handling which we do not want to do here. 
I could make the following a helper function so it is not duplicated though:

if (panic_timeout == 0)
     panic_timeout = ghes_panic_timeout;
panic("Fatal hardware error!");

The pfx is actually being calculated already in __ghes_print_estatus():

         if (pfx == NULL) {
                 if (ghes_severity(estatus->error_severity) <=
                     GHES_SEV_CORRECTED)
                         pfx = KERN_WARNING;
                 else
                         pfx = KERN_ERR;
         }

 From ghes.h:

enum {
         GHES_SEV_NO = 0x0,
         GHES_SEV_CORRECTED = 0x1,
         GHES_SEV_RECOVERABLE = 0x2,
         GHES_SEV_PANIC = 0x3,
};

This will make the pfx KERN_ERR for the case of a panic.

Thanks,
Tyler
>
> Cheers
> Suzuki
>

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.