[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6cde8290-3aa2-411c-bf29-eb91a99e33a5@os.amperecomputing.com>
Date: Tue, 30 Sep 2025 11:08:46 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Dev Jain <dev.jain@....com>, muchun.song@...ux.dev, osalvador@...e.de,
david@...hat.com, akpm@...ux-foundation.org, catalin.marinas@....com,
will@...nel.org, anshuman.khandual@....com, carl@...amperecomputing.com,
cl@...two.org
Cc: linux-mm@...ck.org, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH] mm: hugetlb: avoid soft lockup when mprotect to large
memory area
On 9/29/25 10:26 PM, Dev Jain wrote:
>
> On 30/09/25 1:54 am, Yang Shi wrote:
>> When calling mprotect() to a large hugetlb memory area in our customer's
>> workload (~300GB hugetlb memory), soft lockup was observed:
>>
>> watchdog: BUG: soft lockup - CPU#98 stuck for 23s! [t2_new_sysv:126916]
>>
>> CPU: 98 PID: 126916 Comm: t2_new_sysv Kdump: loaded Not tainted 6.17-rc7
>> Hardware name: GIGACOMPUTING R2A3-T40-AAV1/Jefferson CIO, BIOS
>> 5.4.4.1 07/15/2025
>> pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : mte_clear_page_tags+0x14/0x24
>> lr : mte_sync_tags+0x1c0/0x240
>> sp : ffff80003150bb80
>> x29: ffff80003150bb80 x28: ffff00739e9705a8 x27: 0000ffd2d6a00000
>> x26: 0000ff8e4bc00000 x25: 00e80046cde00f45 x24: 0000000000022458
>> x23: 0000000000000000 x22: 0000000000000004 x21: 000000011b380000
>> x20: ffff000000000000 x19: 000000011b379f40 x18: 0000000000000000
>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>> x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
>> x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc875e0aa5e2c
>> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
>> x5 : fffffc01ce7a5c00 x4 : 00000000046cde00 x3 : fffffc0000000000
>> x2 : 0000000000000004 x1 : 0000000000000040 x0 : ffff0046cde7c000
>>
>> Call trace:
>> mte_clear_page_tags+0x14/0x24
>> set_huge_pte_at+0x25c/0x280
>> hugetlb_change_protection+0x220/0x430
>> change_protection+0x5c/0x8c
>> mprotect_fixup+0x10c/0x294
>> do_mprotect_pkey.constprop.0+0x2e0/0x3d4
>> __arm64_sys_mprotect+0x24/0x44
>> invoke_syscall+0x50/0x160
>> el0_svc_common+0x48/0x144
>> do_el0_svc+0x30/0xe0
>> el0_svc+0x30/0xf0
>> el0t_64_sync_handler+0xc4/0x148
>> el0t_64_sync+0x1a4/0x1a8
>>
>> Soft lockup is not triggered with THP or base page because there is
>> cond_resched() called for each PMD size.
>>
>> Although the soft lockup was triggered by MTE, it should be not MTE
>> specific. The other processing which takes long time in the loop may
>> trigger soft lockup too.
>>
>> So add cond_resched() for hugetlb to avoid soft lockup.
>>
>> Fixes: 8f860591ffb2 ("[PATCH] Enable mprotect on huge pages")
>> Tested-by: Carl Worth <carl@...amperecomputing.com>
>> Reviewed-by: Christoph Lameter (Ampere) <cl@...two.org>
>> Reviewed-by: Catalin Marinas <catalin.marinas@....com>
>> Acked-by: David Hildenbrand <david@...hat.com>
>> Acked-by: Oscar Salvador <osalvador@...e.de>
>> Reviewed-by: Anshuman Khandual <anshuman.khandual@....com>
>> Signed-off-by: Yang Shi <yang@...amperecomputing.com>
>> ---
>> v2: - Made the subject and commit message less MTE specific and fixed
>> the fixes tag.
>> - Collected all R-bs and A-bs.
>>
>> mm/hugetlb.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index cb5c4e79e0b8..fe6606d91b31 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -7242,6 +7242,8 @@ long hugetlb_change_protection(struct
>> vm_area_struct *vma,
>> psize);
>> }
>> spin_unlock(ptl);
>> +
>> + cond_resched();
>> }
>> /*
>> * Must flush TLB before releasing i_mmap_rwsem: x86's
>> huge_pmd_unshare
>
> Reviewed-by: Dev Jain <dev.jain@....com>
Thank you.
>
> Does it make sense to also do cond_resched() in the huge_pmd_unshare()
> branch?
> That also amounts to clearing a page. And I can see for example,
> zap_huge_pmd()
> and change_huge_pmd() consume a cond_resched().
Thanks for raising this. I did think about it. But I didn't convince
myself because shared pmd should be not that common IMHO (If I'm wrong,
please feel free to correct me). At least PMD can't be shared if the
memory is tagged IIRC. So I'd like to keep the patch minimal for now and
defer adding cond_resched() until it is hit by some real life workload.
Yang
Powered by blists - more mailing lists