lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 22 Oct 2022 09:26:54 +0200
From:   Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To:     linux-kernel@...r.kernel.org
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        stable@...r.kernel.org, Shuai Xue <xueshuai@...ux.alibaba.com>,
        Tony Luck <tony.luck@...el.com>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        Sasha Levin <sashal@...nel.org>
Subject: [PATCH 5.19 535/717] ACPI: APEI: do not add task_work to kernel thread to avoid memory leak

From: Shuai Xue <xueshuai@...ux.alibaba.com>

[ Upstream commit 415fed694fe11395df56e05022d6e7cee1d39dd3 ]

If an error is detected as a result of user-space process accessing a
corrupt memory location, the CPU may take an abort. Then the platform
firmware reports kernel via NMI like notifications, e.g. NOTIFY_SEA,
NOTIFY_SOFTWARE_DELEGATED, etc.

For NMI like notifications, commit 7f17b4a121d0 ("ACPI: APEI: Kick the
memory_failure() queue for synchronous errors") keep track of whether
memory_failure() work was queued, and make task_work pending to flush out
the queue so that the work is processed before return to user-space.

The code use init_mm to check whether the error occurs in user space:

    if (current->mm != &init_mm)

The condition is always true, becase _nobody_ ever has "init_mm" as a real
VM any more.

In addition to abort, errors can also be signaled as asynchronous
exceptions, such as interrupt and SError. In such case, the interrupted
current process could be any kind of thread. When a kernel thread is
interrupted, the work ghes_kick_task_work deferred to task_work will never
be processed because entry_handler returns to call ret_to_kernel() instead
of ret_to_user(). Consequently, the estatus_node alloced from
ghes_estatus_pool in ghes_in_nmi_queue_one_entry() will not be freed.
After around 200 allocations in our platform, the ghes_estatus_pool will
run of memory and ghes_in_nmi_queue_one_entry() returns ENOMEM. As a
result, the event failed to be processed.

    sdei: event 805 on CPU 113 failed with error: -2

Finally, a lot of unhandled events may cause platform firmware to exceed
some threshold and reboot.

The condition should generally just do

    if (current->mm)

as described in active_mm.rst documentation.

Then if an asynchronous error is detected when a kernel thread is running,
(e.g. when detected by a background scrubber), do not add task_work to it
as the original patch intends to do.

Fixes: 7f17b4a121d0 ("ACPI: APEI: Kick the memory_failure() queue for synchronous errors")
Signed-off-by: Shuai Xue <xueshuai@...ux.alibaba.com>
Reviewed-by: Tony Luck <tony.luck@...el.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
Signed-off-by: Sasha Levin <sashal@...nel.org>
---
 drivers/acpi/apei/ghes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index d91ad378c00d..80ad530583c9 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -985,7 +985,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
 				ghes_estatus_cache_add(generic, estatus);
 		}
 
-		if (task_work_pending && current->mm != &init_mm) {
+		if (task_work_pending && current->mm) {
 			estatus_node->task_work.func = ghes_kick_task_work;
 			estatus_node->task_work_cpu = smp_processor_id();
 			ret = task_work_add(current, &estatus_node->task_work,
-- 
2.35.1



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ