lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c6af9776-abff-36b8-6ad6-148990664c04@codeaurora.org>
Date:   Wed, 15 Feb 2017 10:07:25 -0700
From:   "Baicar, Tyler" <tbaicar@...eaurora.org>
To:     James Morse <james.morse@....com>, zjzhang@...eaurora.org
Cc:     christoffer.dall@...aro.org, marc.zyngier@....com,
        pbonzini@...hat.com, rkrcmar@...hat.com, linux@...linux.org.uk,
        catalin.marinas@....com, will.deacon@....com, rjw@...ysocki.net,
        lenb@...nel.org, matt@...eblueprint.co.uk, robert.moore@...el.com,
        lv.zheng@...el.com, nkaje@...eaurora.org, mark.rutland@....com,
        akpm@...ux-foundation.org, eun.taik.lee@...sung.com,
        sandeepa.s.prabhu@...il.com, labbott@...hat.com,
        shijie.huang@....com, rruigrok@...eaurora.org,
        paul.gortmaker@...driver.com, tn@...ihalf.com, fu.wei@...aro.org,
        rostedt@...dmis.org, bristot@...hat.com,
        linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.cs.columbia.edu,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-acpi@...r.kernel.org, linux-efi@...r.kernel.org,
        devel@...ica.org, Suzuki.Poulose@....com, punit.agrawal@....com,
        astone@...hat.com, harba@...eaurora.org, hanjun.guo@...aro.org,
        john.garry@...wei.com, shiju.jose@...wei.com
Subject: Re: [PATCH V8 06/10] acpi: apei: panic OS with fatal error status
 block

On 2/15/2017 5:13 AM, James Morse wrote:
> Hi Tyler,
>
> On 13/02/17 22:45, Baicar, Tyler wrote:
>> On 2/9/2017 3:48 AM, James Morse wrote:
>>> On 01/02/17 17:16, Tyler Baicar wrote:
>>>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@...eaurora.org>
>>>>
>>>> Even if an error status block's severity is fatal, the kernel does not
>>>> honor the severity level and panic.
>>>>
>>>> With the firmware first model, the platform could inform the OS about a
>>>> fatal hardware error through the non-NMI GHES notification type. The OS
>>>> should panic when a hardware error record is received with this
>>>> severity.
>>>>
>>>> Call panic() after CPER data in error status block is printed if
>>>> severity is fatal, before each error section is handled.
>>>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>>>> index 8756172..86c1f15 100644
>>>> --- a/drivers/acpi/apei/ghes.c
>>>> +++ b/drivers/acpi/apei/ghes.c
>>>> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2
>>>> *generic_v2)
>>>>        return rc;
>>>>    }
>>>>    +static void __ghes_call_panic(void)
>>>> +{
>>>> +    if (panic_timeout == 0)
>>>> +        panic_timeout = ghes_panic_timeout;
>>>> +    panic("Fatal hardware error!");
>>>> +}
>>>> +
>>> __ghes_panic() also has:
>>>>      __ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);
>>> Which prints this estatus regardless of rate limiting and cache-ing.
> [...]
>
>>>>                ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>>>>        }
>>>> +    if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
>>>> +        __ghes_call_panic();
>>>> +    }
>>> I think this ghes_severity() then panic() should go above the:
>>>>      if (!ghes_estatus_cached(ghes->estatus)) {
>>> and we should call __ghes_print_estatus() here too, to make sure the message
>>> definitely got out!
>
>> Okay, that makes sense. If we move this up, is there a problem with calling
>> __ghes_panic() instead of making the __ghes_print_estatus() and
>> __ghes_call_panic() calls here? It looks like that will just add a call to
>> oops_begin() and ghes_print_queued_estatus() as well, but this is what
>> ghes_notify_nmi() does if the severity is panic.
>
> I don't think the queued stuff is relevant, isn't that just for x86-NMI messages
> that it doesn't print out directly?
>
> A quick grep shows arm64 doesn't have oops_begin(), you may have to add some
> equivalent mechanism. Lets try and avoid that rabbit hole!
>
> Given __ghes_panic() calls __ghes_print_estatus() too, you could try moving that
> into your new __ghes_call_panic().... or whatever results in the least lines
> changed!
Sounds good, I will just use __ghes_print_estatus() and __ghes_call_panic().

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ