linux-kernel - Re: [RFC 1/2] ACPI: APEI: Make memory_failure() triggered by synchronization errors execute in the current context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a4ea2510-7602-f259-f8b9-07cd44c07808@huawei.com>
Date:   Thu, 8 Dec 2022 15:16:47 +0800
From:   Lv Ying <lvying6@...wei.com>
To:     Shuai Xue <xueshuai@...ux.alibaba.com>, <rafael@...nel.org>,
        <lenb@...nel.org>, <james.morse@....com>, <tony.luck@...el.com>,
        <bp@...en8.de>, <naoya.horiguchi@....com>, <linmiaohe@...wei.com>,
        <akpm@...ux-foundation.org>, <ashish.kalra@....com>,
        <xiexiuqi@...wei.com>
CC:     <xiezhipeng1@...wei.com>, <wangkefeng.wang@...wei.com>,
        <tanxiaofei@...wei.com>, <linux-acpi@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
        Bixuan Cui <cuibixuan@...ux.alibaba.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        <yingwen.cyw@...baba-inc.com>
Subject: Re: [RFC 1/2] ACPI: APEI: Make memory_failure() triggered by
 synchronization errors execute in the current context

On 2022/12/8 11:25, Shuai Xue wrote:
> + Xie XiuQi
> 
> On 2022/12/8 AM10:20, Lv Ying wrote:
>>
>>>
>>> We also encountered this problem in production environment, and tried to
>>> solve it by dividing synchronous and asynchronous error handling into different
>>> paths: task work for synchronous error and workqueue for asynchronous error.
>>>
>>> The main challenge is how to distinguish synchronous errors in kernel first
>>> mode through APEI, a related discussion is here.[1]
>>>
>> Hi Shuai:
>>
>> I'm very happy to have received your response, we encountered this problem in our production environment too. I spent a lot of time researching the rationale for synchronous errors and asynchronous errors to share the same process. I mention in my patch commit message: memory_failure() triggered by synchronous error is
>> executed in the kworker context, the early_kill mode of memory_failure()
>> will send wrong si_code by SIGBUS signal because of wrong current process.
>>
>> The challenge for my patch is to prove the rationality of distinguishing synchronous errors. I do not have a good idea of distinguishing synchronous error by looking through ACPI/UEFI spec, so I sent this patchset for more input. And I resent RFC PATCH v1 [1]add this as TODO.
>>
>>>> @@ -978,14 +979,14 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
>>>>            estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
>>>>            len = cper_estatus_len(estatus);
>>>>            node_len = GHES_ESTATUS_NODE_LEN(len);
>>>> -        task_work_pending = ghes_do_proc(estatus_node->ghes, estatus);
>>>> +        corruption_page_pending = ghes_do_proc(estatus_node->ghes, estatus, true);
>>>>            if (!ghes_estatus_cached(estatus)) {
>>>>                generic = estatus_node->generic;
>>>>                if (ghes_print_estatus(NULL, generic, estatus))
>>>>                    ghes_estatus_cache_add(generic, estatus);
>>>>            }
>>>
>>> In the case of your patch, it is inappropriate to assume that ghes_proc_in_irq() is only
>>> called to handle synchronous error. Firmware could notify all synchronous and asynchronous
>>> error signals to kernel through NMI notification, e.g. SDEI. In this case, asynchronous
>>> error will be treated as synchronous error.
>>>
>>
>> Yes, as I mentioned above, I do not have a good idea of distinguishing synchronous error. I agree with you that ghes_proc_in_irq() is called in SDEI, SEA, NMI notify type, they are NMI-like notify, this function run some job which may not be NMI safe in IRQ context. And NMI may be asynchronous error.
>>
>> However, cureent kernel use ghes_kick_task_work in ghes_proc_in_irq(), there is an assumption here that ghes_proc_in_irq() are currently in the context of a synchronous exception, although this is not appropriate.
>>
>>> Our colleague Yingwen has submitted a proposal to extend acpi_hest_generic_data::flag (bit 8)
>>> to indicate that the error is a synchronous[2]. Personally speaking, it is a more general
>>> solution and completely solves the problem.
>>>
>>>
>>>> Background:
>>>>
>>>> In ARM world, two type events (Sync/Async) from hardware IP need OS/VMM take different actions.
>>>> Current CPER memory error record is not able to distinguish sync/async type event right now.
>>>> Current OS/VMM need to take extra actions beyond CPER which is heavy burden to identify the
>>>> two type events
>>>>
>>>> Sync event (e.g. CPU consume poisoned data) --> Firmware  -> CPER error log  --> OS/VMM take recovery action.
>>>> Async event (e.g. Memory controller detect UE event)  --> Firmware  --> CPER error log  --> OS take page action.
>>>>
>>>>
>>>> Proposal:
>>>>
>>>> - In section description Flags field(UEFI spec section N.2, add sync flag as below. OS/VMM
>>>>    could depend on this flag to distinguish sync/async events.
>>>> - Bit8 – sync flag; if set this flag indicates that this event record is synchronous(e.g.
>>>>    cpu core consumes poison data, then cause instruction/data abort); if not set, this event record is asynchronous.
>>>>
>>>> Best regards,
>>>> Yingwen Chen
>>>
>>> A RFC patch set based on above proposal is here[3].
>>>
>>
>> I'm glad your team is working on advancing this thing in the UEFI community. This will make kernel identify synchronous as an easy job. Based on your proposal, I think your work is suitable.
>>
>> However, there are real problems here in the running production environment of the LTS kernel:
>> 1. this new proposal has not yet been accepted by the UEFI spec
>> 2. it will require firmware update in production environment which may be more difficult than kernel livepatch upgrade.
>>
>> If there are more ways to distinguish synchronous error in kernel APEI,
>> I can try it. For example, in APEI mode, is it suitable to use notify type to distinguish synchronous error?
>> synchronous error:
>> SDEI SEA
>> asynchronous error:
>> NMI
> 
> Sorry, I'm afraid it's not suitable. It is no doubt that SEA notification is synchronous,
> but we should not assume that platform use NMI notification for asynchronous errors,
> and SDEI notification for synchronous errors.
> 
> If we want to address the problem now, I prefer to just consider SEA notification is synchronous
> and add MF_ACTION_REQUIRED when SEA is adopt. Please refer to XiuQi's work.[1]
> 
> [1]https://lore.kernel.org/linux-arm-kernel/20221205160043.57465-4-xiexiuqi@huawei.com/T/
> 
> Best Regards,
> Shuai
> 
Thanks for your advice, it may be currently more practical to solve the 
problem that synchronous and asynchronous errors
share the same processing flow only in the SEA scenario. I will refer to 
XiuQi's work and update RFC PATCH v2.


-- 
Thanks!
Lv Ying