[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250929202402.1663290-1-yang@os.amperecomputing.com>
Date: Mon, 29 Sep 2025 13:24:02 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: muchun.song@...ux.dev,
osalvador@...e.de,
david@...hat.com,
akpm@...ux-foundation.org,
catalin.marinas@....com,
will@...nel.org,
anshuman.khandual@....com,
carl@...amperecomputing.com,
cl@...two.org
Cc: yang@...amperecomputing.com,
linux-mm@...ck.org,
linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: [v2 PATCH] mm: hugetlb: avoid soft lockup when mprotect to large memory area
When calling mprotect() to a large hugetlb memory area in our customer's
workload (~300GB hugetlb memory), soft lockup was observed:
watchdog: BUG: soft lockup - CPU#98 stuck for 23s! [t2_new_sysv:126916]
CPU: 98 PID: 126916 Comm: t2_new_sysv Kdump: loaded Not tainted 6.17-rc7
Hardware name: GIGACOMPUTING R2A3-T40-AAV1/Jefferson CIO, BIOS 5.4.4.1 07/15/2025
pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : mte_clear_page_tags+0x14/0x24
lr : mte_sync_tags+0x1c0/0x240
sp : ffff80003150bb80
x29: ffff80003150bb80 x28: ffff00739e9705a8 x27: 0000ffd2d6a00000
x26: 0000ff8e4bc00000 x25: 00e80046cde00f45 x24: 0000000000022458
x23: 0000000000000000 x22: 0000000000000004 x21: 000000011b380000
x20: ffff000000000000 x19: 000000011b379f40 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc875e0aa5e2c
x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
x5 : fffffc01ce7a5c00 x4 : 00000000046cde00 x3 : fffffc0000000000
x2 : 0000000000000004 x1 : 0000000000000040 x0 : ffff0046cde7c000
Call trace:
mte_clear_page_tags+0x14/0x24
set_huge_pte_at+0x25c/0x280
hugetlb_change_protection+0x220/0x430
change_protection+0x5c/0x8c
mprotect_fixup+0x10c/0x294
do_mprotect_pkey.constprop.0+0x2e0/0x3d4
__arm64_sys_mprotect+0x24/0x44
invoke_syscall+0x50/0x160
el0_svc_common+0x48/0x144
do_el0_svc+0x30/0xe0
el0_svc+0x30/0xf0
el0t_64_sync_handler+0xc4/0x148
el0t_64_sync+0x1a4/0x1a8
Soft lockup is not triggered with THP or base page because there is
cond_resched() called for each PMD size.
Although the soft lockup was triggered by MTE, it should be not MTE
specific. The other processing which takes long time in the loop may
trigger soft lockup too.
So add cond_resched() for hugetlb to avoid soft lockup.
Fixes: 8f860591ffb2 ("[PATCH] Enable mprotect on huge pages")
Tested-by: Carl Worth <carl@...amperecomputing.com>
Reviewed-by: Christoph Lameter (Ampere) <cl@...two.org>
Reviewed-by: Catalin Marinas <catalin.marinas@....com>
Acked-by: David Hildenbrand <david@...hat.com>
Acked-by: Oscar Salvador <osalvador@...e.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual@....com>
Signed-off-by: Yang Shi <yang@...amperecomputing.com>
---
v2: - Made the subject and commit message less MTE specific and fixed
the fixes tag.
- Collected all R-bs and A-bs.
mm/hugetlb.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index cb5c4e79e0b8..fe6606d91b31 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7242,6 +7242,8 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
psize);
}
spin_unlock(ptl);
+
+ cond_resched();
}
/*
* Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare
--
2.47.0
Powered by blists - more mailing lists