[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <55DCA61C.8010109@codeaurora.org>
Date: Tue, 25 Aug 2015 10:30:04 -0700
From: "Zhang, Jonathan Zhixiong" <zjzhang@...eaurora.org>
To: Ingo Molnar <mingo@...nel.org>
Cc: Will Deacon <will.deacon@....com>,
Thomas Gleixner <tglx@...utronix.de>,
"H . Peter Anvin" <hpa@...or.com>,
"linux-kernel @ vger . kernel . org" <linux-kernel@...r.kernel.org>,
"linux-efi @ vger . kernel . org" <linux-efi@...r.kernel.org>,
Matt Fleming <matt.fleming@...el.com>,
Borislav Petkov <bp@...e.de>,
Ard Biesheuvel <ard.biesheuvel@...aro.org>,
Catalin Marinas <Catalin.Marinas@....com>,
Matt Fleming <matt@...eblueprint.co.uk>
Subject: Re: [PATCH 2/2] acpi, apei: use appropriate pgprot_t to map GHES
memory
On 8/25/2015 1:59 AM, Ingo Molnar wrote:
>
> * Zhang, Jonathan Zhixiong <zjzhang@...eaurora.org> wrote:
>
>>
>>
>> On 8/22/2015 2:24 AM, Ingo Molnar wrote:
>>>
>>> * Jonathan (Zhixiong) Zhang <zjzhang@...eaurora.org> wrote:
>>>
>>>> From: "Jonathan (Zhixiong) Zhang" <zjzhang@...eaurora.org>
>>>>
>>>> With ACPI APEI firmware first handling, generic hardware error
>>>> record is updated by firmware in GHES memory region. On an arm64
>>>> platform, firmware updates GHES memory region with uncached
>>>> access attribute, and then Linux reads stale data from cache.
>>>
>>> This paragraph *still* doesn't parse for me. It's not any English
>>> I can recognize: what is a 'With ACPI APEI firmware first handling'?
>> APEI is ACPI Platform Error Interface; it is part of ACPI spec,
>> defining the aspect of hardware error handling. "firmware first
>> handling" is a terminology used in APEI. It describes such mechanism
>> that when hardware error happens, firmware intersects/handles such
>> hardware error, formulates hardware error record and writes the record
>> to GHES memory region, notifies the kernel through NMI/interrupt, then
>> the kernel GHES driver grabs the error record from the GHES memory
>> region.
>
> Argh. So how about translating that to English and putting that misnomer into
> scare quotes, and saying something like:
>
> If the ACPI APEI firmware handles the error first (called "firmware first
> handling"), the generic hardware error record is updated by the firmware in the
> GHES memory region.
>
> ( Also note all the missing articles I added for readability. The rest of the
> changelog is missing articles as well. )
Thank you very much, Ingo. Input are taken.
>
>>> ... plus what this changelog still doesn't mention is the most important part
>>> of any bug fix description: how does the user notice this in practice and why
>>> does he care?
>>
>> The changelog mentioned that Linux would read stale data from cache. When stale
>> data is read, kernel reports there is no new hardware error when there actually
>> is.
>
> Note that this is the most valuable sentence so far, in this whole changelog and
> discussion. And we needed how many emails to get to this point?
>
> obviously saying 'stale data' in itself does not mean much - it could mean a
> harmless inconsistency nobody really cares about, or in fact it could mean
> something more serious:
Sure, makes sense.
>
>> [...] This may lead to further damage in various scenarios, such as error
>> propagation caused data corruption.
>
> Please outline this better. How users are affected in practice is far more
> important than any other detail.
Yes, will do. I just sent out an update for your review.
>
> Thanks,
>
> Ingo
>
--
Jonathan (Zhixiong) Zhang
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists