[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fc05bd05-274d-3491-bbf8-f38dc0a16e49@codeaurora.org>
Date: Fri, 30 Jun 2017 10:47:17 -0600
From: "Baicar, Tyler" <tbaicar@...eaurora.org>
To: Robert Richter <robert.richter@...ium.com>
Cc: christoffer.dall@...aro.org, marc.zyngier@....com,
pbonzini@...hat.com, rkrcmar@...hat.com, linux@...linux.org.uk,
catalin.marinas@....com, will.deacon@....com, rjw@...ysocki.net,
lenb@...nel.org, matt@...eblueprint.co.uk, robert.moore@...el.com,
lv.zheng@...el.com, nkaje@...eaurora.org, zjzhang@...eaurora.org,
mark.rutland@....com, james.morse@....com,
akpm@...ux-foundation.org, eun.taik.lee@...sung.com,
sandeepa.s.prabhu@...il.com, labbott@...hat.com,
shijie.huang@....com, rruigrok@...eaurora.org,
paul.gortmaker@...driver.com, tn@...ihalf.com, fu.wei@...aro.org,
rostedt@...dmis.org, bristot@...hat.com,
linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.cs.columbia.edu,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-acpi@...r.kernel.org, linux-efi@...r.kernel.org,
Suzuki.Poulose@....com, punit.agrawal@....com, astone@...hat.com,
harba@...eaurora.org, hanjun.guo@...aro.org, john.garry@...wei.com,
shiju.jose@...wei.com, joe@...ches.com, bp@...en8.de,
rafael@...nel.org, tony.luck@...el.com, gengdongjiu@...wei.com,
xiexiuqi@...wei.com
Subject: Re: [PATCH V17 01/11] acpi: apei: read ack upon ghes record
consumption
On 6/30/2017 4:10 AM, Robert Richter wrote:
> Tyler,
>
> On 19.05.17 14:32:03, Tyler Baicar wrote:
>> A RAS (Reliability, Availability, Serviceability) controller
>> may be a separate processor running in parallel with OS
>> execution, and may generate error records for consumption by
>> the OS. If the RAS controller produces multiple error records,
>> then they may be overwritten before the OS has consumed them.
>>
>> The Generic Hardware Error Source (GHES) v2 structure
>> introduces the capability for the OS to acknowledge the
>> consumption of the error record generated by the RAS
>> controller. A RAS controller supporting GHESv2 shall wait for
>> the acknowledgment before writing a new error record, thus
>> eliminating the race condition.
>>
>> Add support for parsing of GHESv2 sub-tables as well.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@...eaurora.org>
>> CC: Jonathan (Zhixiong) Zhang <zjzhang@...eaurora.org>
>> Reviewed-by: James Morse <james.morse@....com>
>> ---
>> drivers/acpi/apei/ghes.c | 59 +++++++++++++++++++++++++++++++++++++++++++++---
>> drivers/acpi/apei/hest.c | 7 ++++--
>> include/acpi/ghes.h | 5 +++-
>> 3 files changed, 65 insertions(+), 6 deletions(-)
>> static int ghes_proc(struct ghes *ghes)
>> {
>> int rc;
>> @@ -661,6 +704,16 @@ static int ghes_proc(struct ghes *ghes)
>> ghes_estatus_cache_add(ghes->generic, ghes->estatus);
>> }
>> ghes_do_proc(ghes, ghes->estatus);
>> +
>> + /*
>> + * GHESv2 type HEST entries introduce support for error acknowledgment,
>> + * so only acknowledge the error if this support is present.
>> + */
>> + if (is_hest_type_generic_v2(ghes)) {
>> + rc = ghes_ack_error(ghes->generic_v2);
>> + if (rc)
>> + return rc;
>> + }
>> out:
>> ghes_clear_estatus(ghes);
>> return rc;
> was there any specific reason why the ack is sent before clearing the
> block status? Spec says the ack should be sent at last.
>
> Also, the block is never cleared if ghes_ack_error() returns an error.
> IMO we should fall through and clear the block status (this will
> change anyway if the bloc status is cleared first).
Hello Robert,
Thank you for pointing this out. I will send a patch to move the ack
after the ghes_clear_estatus. This is probably the right thing to do
since right now if the FW populates an invalid estatus, we will fail to
read the estatus, jump to 'out:', and never send the ack.
Thanks,
Tyler
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.
Powered by blists - more mailing lists