lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <589C490A.9080109@arm.com>
Date:   Thu, 09 Feb 2017 10:48:42 +0000
From:   James Morse <james.morse@....com>
To:     Tyler Baicar <tbaicar@...eaurora.org>, zjzhang@...eaurora.org
CC:     christoffer.dall@...aro.org, marc.zyngier@....com,
        pbonzini@...hat.com, rkrcmar@...hat.com, linux@...linux.org.uk,
        catalin.marinas@....com, will.deacon@....com, rjw@...ysocki.net,
        lenb@...nel.org, matt@...eblueprint.co.uk, robert.moore@...el.com,
        lv.zheng@...el.com, nkaje@...eaurora.org, mark.rutland@....com,
        akpm@...ux-foundation.org, eun.taik.lee@...sung.com,
        sandeepa.s.prabhu@...il.com, labbott@...hat.com,
        shijie.huang@....com, rruigrok@...eaurora.org,
        paul.gortmaker@...driver.com, tn@...ihalf.com, fu.wei@...aro.org,
        rostedt@...dmis.org, bristot@...hat.com,
        linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.cs.columbia.edu,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        linux-acpi@...r.kernel.org, linux-efi@...r.kernel.org,
        devel@...ica.org, Suzuki.Poulose@....com, punit.agrawal@....com,
        astone@...hat.com, harba@...eaurora.org, hanjun.guo@...aro.org,
        john.garry@...wei.com, shiju.jose@...wei.com
Subject: Re: [PATCH V8 06/10] acpi: apei: panic OS with fatal error status
 block

Hi Jonathan, Tyler,

On 01/02/17 17:16, Tyler Baicar wrote:
> From: "Jonathan (Zhixiong) Zhang" <zjzhang@...eaurora.org>
> 
> Even if an error status block's severity is fatal, the kernel does not
> honor the severity level and panic.
> 
> With the firmware first model, the platform could inform the OS about a
> fatal hardware error through the non-NMI GHES notification type. The OS
> should panic when a hardware error record is received with this
> severity.
> 
> Call panic() after CPER data in error status block is printed if
> severity is fatal, before each error section is handled.

> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 8756172..86c1f15 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -687,6 +689,13 @@ static int ghes_ack_error(struct acpi_hest_generic_v2 *generic_v2)
>  	return rc;
>  }
>  
> +static void __ghes_call_panic(void)
> +{
> +	if (panic_timeout == 0)
> +		panic_timeout = ghes_panic_timeout;
> +	panic("Fatal hardware error!");
> +}
> +

__ghes_panic() also has:
>	__ghes_print_estatus(KERN_EMERG, ghes->generic, ghes->estatus);

Which prints this estatus regardless of rate limiting and cache-ing.

[ ... ]

> @@ -698,6 +707,10 @@ static int ghes_proc(struct ghes *ghes)
>  		if (ghes_print_estatus(NULL, ghes->generic, ghes->estatus))

ghes_print_estatus() uses some custom rate limiting '2 messages every 5
seconds', GHES_SEV_PANIC shares the same limit as GHES_SEV_RECOVERABLE.

I think its possible to get 2 recoverable messages, then a panic in a 5 second
window. The rate limit will kick in to stop the panic estatus block being
printed, but we still go on to call panic() without the real reason being printed...

(the caching thing only seems to consider identical messages, given we would
never see two panic messages, I don't think that will cause any problems.)

>  			ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>  	}
> +	if (ghes_severity(ghes->estatus->error_severity) >= GHES_SEV_PANIC) {
> +		__ghes_call_panic();
> +	}
> +

I think this ghes_severity() then panic() should go above the:
>	if (!ghes_estatus_cached(ghes->estatus)) {
and we should call __ghes_print_estatus() here too, to make sure the message
definitely got out!


With that,
Reviewed-by: James Morse <james.morse@....com>


Thanks,

James

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ