[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221207093935.1972530-3-lvying6@huawei.com>
Date: Wed, 7 Dec 2022 17:39:35 +0800
From: Lv Ying <lvying6@...wei.com>
To: <rafael@...nel.org>, <lenb@...nel.org>, <james.morse@....com>,
<tony.luck@...el.com>, <bp@...en8.de>, <naoya.horiguchi@....com>,
<linmiaohe@...wei.com>, <akpm@...ux-foundation.org>,
<xueshuai@...ux.alibaba.com>, <ashish.kalra@....com>
CC: <xiezhipeng1@...wei.com>, <wangkefeng.wang@...wei.com>,
<xiexiuqi@...wei.com>, <tanxiaofei@...wei.com>,
<lvying6@...wei.com>, <linux-acpi@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>
Subject: [RFC PATCH v1 2/2] ACPI: APEI: fix reboot caused by synchronous error loop because of memory_failure() failed
Synchronous error was detected as a result of user-space accessing a
corrupt memory location the CPU may take an abort instead. On arm64 this
is a 'synchronous external abort' which can be notified by SEA.
If memory_failure() failed, we return to user-space will trigger SEA again,
such loop may cause platform firmware to exceed some threshold and reboot
when Linux could have recovered from this error. Not all memory_failure()
processing failures will cause the reboot, VM_FAULT_HWPOISON[_LARGE]
handling in arm64 page fault will send SIGBUS signal to the user-space
accessing process to terminate this loop.
If process mapping fault page, but memory_failure() abnormal return before
try_to_unmap(), for example, the fault page process mapping is KSM page.
In this case, arm64 cannot use the page fault process to terminate the
loop.
Add judgement of memory_failure() result in task_work before returning to
user-space. If memory_failure() failed, send SIGBUS signal to the current
process to avoid SEA loop.
Signed-off-by: Lv Ying <lvying6@...wei.com>
---
mm/memory-failure.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3b6ac3694b8d..07ec7b62f330 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2255,7 +2255,7 @@ static void __memory_failure_work_func(struct work_struct *work, bool sync)
struct memory_failure_cpu *mf_cpu;
struct memory_failure_entry entry = { 0, };
unsigned long proc_flags;
- int gotten;
+ int gotten, ret;
mf_cpu = container_of(work, struct memory_failure_cpu, work);
for (;;) {
@@ -2266,7 +2266,16 @@ static void __memory_failure_work_func(struct work_struct *work, bool sync)
break;
if (entry.flags & MF_SOFT_OFFLINE)
soft_offline_page(entry.pfn, entry.flags);
- else if (!sync || (entry.flags & MF_ACTION_REQUIRED))
+ else if (sync) {
+ if (entry.flags & MF_ACTION_REQUIRED) {
+ ret = memory_failure(entry.pfn, entry.flags);
+ if (ret == -EHWPOISON || ret == -EOPNOTSUPP)
+ return;
+
+ pr_err("Memory error not recovered");
+ force_sig(SIGBUS);
+ }
+ } else
memory_failure(entry.pfn, entry.flags);
}
}
--
2.36.1
Powered by blists - more mailing lists