linux-kernel - [RFC PATCH] mm: silence soft lockups from unlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20200721063258.17140-1-mhocko@kernel.org>
Date:   Tue, 21 Jul 2020 08:32:58 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     <linux-mm@...ck.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Michal Hocko <mhocko@...e.com>
Subject: [RFC PATCH] mm: silence soft lockups from unlock_page

From: Michal Hocko <mhocko@...e.com>

We have seen a bug report with huge number of soft lockups during the
system boot on !PREEMPT kernel
NMI watchdog: BUG: soft lockup - CPU#1291 stuck for 22s! [systemd-udevd:43283]
[...]
NIP [c00000000094e66c] _raw_spin_lock_irqsave+0xac/0x100
LR [c00000000094e654] _raw_spin_lock_irqsave+0x94/0x100
Call Trace:
[c00002293d883b30] [c0000000012cdee8] __pmd_index_size+0x0/0x8 (unreliable)
[c00002293d883b70] [c0000000002b7490] wake_up_page_bit+0xc0/0x150
[c00002293d883bf0] [c00000000030ceb8] do_fault+0x448/0x870
[c00002293d883c40] [c000000000310080] __handle_mm_fault+0x880/0x16b0
[c00002293d883d10] [c000000000311018] handle_mm_fault+0x168/0x250
[c00002293d883d50] [c00000000006c488] do_page_fault+0x568/0x8d0
[c00002293d883e30] [c00000000000a534] handle_page_fault+0x18/0x38

on a large ppc machine. The very likely cause is a suboptimal
configuration when systed-udev spawns way too many workders to bring the
system up.

The lockup is in page_unlock in do_read_fault and I suspect that this is
yet another effect of a very long waitqueue chain which has been
addresses by 11a19c7b099f ("sched/wait: Introduce wakeup boomark in
wake_up_page_bit") previously. The commit primarily aimed at hard lockup
prevention but it doesn't really help !PREEMPT case which still has to
process all the work without any rescheduling point. This is however not
really trivial because page_unlock is called from many contexts many of
which are likely called from an atomic context.

Introducing page_unlock_sleepable is certainly an option but it seems
like a hard to maintain option which doesn't really fix the underlying
problem as the same might happen from other unlock_page callers. This
patch doesn't address the underlying problem but it reduces the visible
effect. Tell the soft lockup to shut up when retrying batches on queued
waiters. This will also allow systems configured to panic on warning to
proceed with the boot.

Signed-off-by: Michal Hocko <mhocko@...e.com>
---
 mm/filemap.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 385759c4ce4b..74681c40a6e5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -41,6 +41,7 @@
 #include <linux/delayacct.h>
 #include <linux/psi.h>
 #include <linux/ramfs.h>
+#include <linux/nmi.h>
 #include "internal.h"

 #define CREATE_TRACE_POINTS
@@ -1055,6 +1056,7 @@ static void wake_up_page_bit(struct page *page, int bit_nr)
 		 */
 		spin_unlock_irqrestore(&q->lock, flags);
 		cpu_relax();
+		touch_softlockup_watchdog();
 		spin_lock_irqsave(&q->lock, flags);
 		__wake_up_locked_key_bookmark(q, TASK_NORMAL, &key, &bookmark);
 	}
-- 
2.27.0