lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6cde8290-3aa2-411c-bf29-eb91a99e33a5@os.amperecomputing.com>
Date: Tue, 30 Sep 2025 11:08:46 -0700
From: Yang Shi <yang@...amperecomputing.com>
To: Dev Jain <dev.jain@....com>, muchun.song@...ux.dev, osalvador@...e.de,
 david@...hat.com, akpm@...ux-foundation.org, catalin.marinas@....com,
 will@...nel.org, anshuman.khandual@....com, carl@...amperecomputing.com,
 cl@...two.org
Cc: linux-mm@...ck.org, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH] mm: hugetlb: avoid soft lockup when mprotect to large
 memory area



On 9/29/25 10:26 PM, Dev Jain wrote:
>
> On 30/09/25 1:54 am, Yang Shi wrote:
>> When calling mprotect() to a large hugetlb memory area in our customer's
>> workload (~300GB hugetlb memory), soft lockup was observed:
>>
>> watchdog: BUG: soft lockup - CPU#98 stuck for 23s! [t2_new_sysv:126916]
>>
>> CPU: 98 PID: 126916 Comm: t2_new_sysv Kdump: loaded Not tainted 6.17-rc7
>> Hardware name: GIGACOMPUTING R2A3-T40-AAV1/Jefferson CIO, BIOS 
>> 5.4.4.1 07/15/2025
>> pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> pc : mte_clear_page_tags+0x14/0x24
>> lr : mte_sync_tags+0x1c0/0x240
>> sp : ffff80003150bb80
>> x29: ffff80003150bb80 x28: ffff00739e9705a8 x27: 0000ffd2d6a00000
>> x26: 0000ff8e4bc00000 x25: 00e80046cde00f45 x24: 0000000000022458
>> x23: 0000000000000000 x22: 0000000000000004 x21: 000000011b380000
>> x20: ffff000000000000 x19: 000000011b379f40 x18: 0000000000000000
>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>> x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
>> x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc875e0aa5e2c
>> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
>> x5 : fffffc01ce7a5c00 x4 : 00000000046cde00 x3 : fffffc0000000000
>> x2 : 0000000000000004 x1 : 0000000000000040 x0 : ffff0046cde7c000
>>
>> Call trace:
>>    mte_clear_page_tags+0x14/0x24
>>    set_huge_pte_at+0x25c/0x280
>>    hugetlb_change_protection+0x220/0x430
>>    change_protection+0x5c/0x8c
>>    mprotect_fixup+0x10c/0x294
>>    do_mprotect_pkey.constprop.0+0x2e0/0x3d4
>>    __arm64_sys_mprotect+0x24/0x44
>>    invoke_syscall+0x50/0x160
>>    el0_svc_common+0x48/0x144
>>    do_el0_svc+0x30/0xe0
>>    el0_svc+0x30/0xf0
>>    el0t_64_sync_handler+0xc4/0x148
>>    el0t_64_sync+0x1a4/0x1a8
>>
>> Soft lockup is not triggered with THP or base page because there is
>> cond_resched() called for each PMD size.
>>
>> Although the soft lockup was triggered by MTE, it should be not MTE
>> specific. The other processing which takes long time in the loop may
>> trigger soft lockup too.
>>
>> So add cond_resched() for hugetlb to avoid soft lockup.
>>
>> Fixes: 8f860591ffb2 ("[PATCH] Enable mprotect on huge pages")
>> Tested-by: Carl Worth <carl@...amperecomputing.com>
>> Reviewed-by: Christoph Lameter (Ampere) <cl@...two.org>
>> Reviewed-by: Catalin Marinas <catalin.marinas@....com>
>> Acked-by: David Hildenbrand <david@...hat.com>
>> Acked-by: Oscar Salvador <osalvador@...e.de>
>> Reviewed-by: Anshuman Khandual <anshuman.khandual@....com>
>> Signed-off-by: Yang Shi <yang@...amperecomputing.com>
>> ---
>> v2: - Made the subject and commit message less MTE specific and fixed
>>        the fixes tag.
>>      - Collected all R-bs and A-bs.
>>
>>   mm/hugetlb.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> index cb5c4e79e0b8..fe6606d91b31 100644
>> --- a/mm/hugetlb.c
>> +++ b/mm/hugetlb.c
>> @@ -7242,6 +7242,8 @@ long hugetlb_change_protection(struct 
>> vm_area_struct *vma,
>>                           psize);
>>           }
>>           spin_unlock(ptl);
>> +
>> +        cond_resched();
>>       }
>>       /*
>>        * Must flush TLB before releasing i_mmap_rwsem: x86's 
>> huge_pmd_unshare
>
> Reviewed-by: Dev Jain <dev.jain@....com>

Thank you.

>
> Does it make sense to also do cond_resched() in the huge_pmd_unshare() 
> branch?
> That also amounts to clearing a page. And I can see for example, 
> zap_huge_pmd()
> and change_huge_pmd() consume a cond_resched().

Thanks for raising this. I did think about it. But I didn't convince 
myself because shared pmd should be not that common IMHO (If I'm wrong, 
please feel free to correct me). At least PMD can't be shared if the 
memory is tagged IIRC. So I'd like to keep the patch minimal for now and 
defer adding cond_resched() until it is hit by some real life workload.

Yang



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ