linux-kernel - Re: [v2 PATCH] mm: hugetlb: avoid soft lockup when mprotect to large memory area

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <819891f6-4e30-470c-b641-350a14b395a2@redhat.com>
Date: Wed, 1 Oct 2025 10:32:02 +0200
From: David Hildenbrand <david@...hat.com>
To: Dev Jain <dev.jain@....com>, Yang Shi <yang@...amperecomputing.com>,
 muchun.song@...ux.dev, osalvador@...e.de, akpm@...ux-foundation.org,
 catalin.marinas@....com, will@...nel.org, anshuman.khandual@....com,
 carl@...amperecomputing.com, cl@...two.org
Cc: linux-mm@...ck.org, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org
Subject: Re: [v2 PATCH] mm: hugetlb: avoid soft lockup when mprotect to large
 memory area

On 01.10.25 06:23, Dev Jain wrote:
> 
> On 30/09/25 11:38 pm, Yang Shi wrote:
>>
>>
>> On 9/29/25 10:26 PM, Dev Jain wrote:
>>>
>>> On 30/09/25 1:54 am, Yang Shi wrote:
>>>> When calling mprotect() to a large hugetlb memory area in our
>>>> customer's
>>>> workload (~300GB hugetlb memory), soft lockup was observed:
>>>>
>>>> watchdog: BUG: soft lockup - CPU#98 stuck for 23s! [t2_new_sysv:126916]
>>>>
>>>> CPU: 98 PID: 126916 Comm: t2_new_sysv Kdump: loaded Not tainted
>>>> 6.17-rc7
>>>> Hardware name: GIGACOMPUTING R2A3-T40-AAV1/Jefferson CIO, BIOS
>>>> 5.4.4.1 07/15/2025
>>>> pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>> pc : mte_clear_page_tags+0x14/0x24
>>>> lr : mte_sync_tags+0x1c0/0x240
>>>> sp : ffff80003150bb80
>>>> x29: ffff80003150bb80 x28: ffff00739e9705a8 x27: 0000ffd2d6a00000
>>>> x26: 0000ff8e4bc00000 x25: 00e80046cde00f45 x24: 0000000000022458
>>>> x23: 0000000000000000 x22: 0000000000000004 x21: 000000011b380000
>>>> x20: ffff000000000000 x19: 000000011b379f40 x18: 0000000000000000
>>>> x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>>>> x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
>>>> x11: 0000000000000000 x10: 0000000000000000 x9 : ffffc875e0aa5e2c
>>>> x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
>>>> x5 : fffffc01ce7a5c00 x4 : 00000000046cde00 x3 : fffffc0000000000
>>>> x2 : 0000000000000004 x1 : 0000000000000040 x0 : ffff0046cde7c000
>>>>
>>>> Call trace:
>>>>     mte_clear_page_tags+0x14/0x24
>>>>     set_huge_pte_at+0x25c/0x280
>>>>     hugetlb_change_protection+0x220/0x430
>>>>     change_protection+0x5c/0x8c
>>>>     mprotect_fixup+0x10c/0x294
>>>>     do_mprotect_pkey.constprop.0+0x2e0/0x3d4
>>>>     __arm64_sys_mprotect+0x24/0x44
>>>>     invoke_syscall+0x50/0x160
>>>>     el0_svc_common+0x48/0x144
>>>>     do_el0_svc+0x30/0xe0
>>>>     el0_svc+0x30/0xf0
>>>>     el0t_64_sync_handler+0xc4/0x148
>>>>     el0t_64_sync+0x1a4/0x1a8
>>>>
>>>> Soft lockup is not triggered with THP or base page because there is
>>>> cond_resched() called for each PMD size.
>>>>
>>>> Although the soft lockup was triggered by MTE, it should be not MTE
>>>> specific. The other processing which takes long time in the loop may
>>>> trigger soft lockup too.
>>>>
>>>> So add cond_resched() for hugetlb to avoid soft lockup.
>>>>
>>>> Fixes: 8f860591ffb2 ("[PATCH] Enable mprotect on huge pages")
>>>> Tested-by: Carl Worth <carl@...amperecomputing.com>
>>>> Reviewed-by: Christoph Lameter (Ampere) <cl@...two.org>
>>>> Reviewed-by: Catalin Marinas <catalin.marinas@....com>
>>>> Acked-by: David Hildenbrand <david@...hat.com>
>>>> Acked-by: Oscar Salvador <osalvador@...e.de>
>>>> Reviewed-by: Anshuman Khandual <anshuman.khandual@....com>
>>>> Signed-off-by: Yang Shi <yang@...amperecomputing.com>
>>>> ---
>>>> v2: - Made the subject and commit message less MTE specific and fixed
>>>>         the fixes tag.
>>>>       - Collected all R-bs and A-bs.
>>>>
>>>>    mm/hugetlb.c | 2 ++
>>>>    1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>>> index cb5c4e79e0b8..fe6606d91b31 100644
>>>> --- a/mm/hugetlb.c
>>>> +++ b/mm/hugetlb.c
>>>> @@ -7242,6 +7242,8 @@ long hugetlb_change_protection(struct
>>>> vm_area_struct *vma,
>>>>                            psize);
>>>>            }
>>>>            spin_unlock(ptl);
>>>> +
>>>> +        cond_resched();
>>>>        }
>>>>        /*
>>>>         * Must flush TLB before releasing i_mmap_rwsem: x86's
>>>> huge_pmd_unshare
>>>
>>> Reviewed-by: Dev Jain <dev.jain@....com>
>>
>> Thank you.
>>
>>>
>>> Does it make sense to also do cond_resched() in the
>>> huge_pmd_unshare() branch?
>>> That also amounts to clearing a page. And I can see for example,
>>> zap_huge_pmd()
>>> and change_huge_pmd() consume a cond_resched().
>>
>> Thanks for raising this. I did think about it. But I didn't convince
>> myself because shared pmd should be not that common IMHO (If I'm
>> wrong, please feel free to correct me). At least PMD can't be shared
>> if the memory is tagged IIRC. So I'd like to keep the patch minimal
>> for now and defer adding cond_resched() until it is hit by some real
>> life workload.
> 
> If we have large swathes of hugetlb memory like in your workload, and it
> is MAP_SHARED, then there should be high chances of sharing the PMD.
> Although, I incorrectly
> 
> observed that we are clearing a page there - we are only clearing the
> pud entry which is 8 bytes. So yes a soft lockup should be highly
> unlikely. But since cond_resched()
> 
> is cheap (I assume this is the case since it is liberally sprinkled all
> over the codebase) I think we should be consistent. Probably not an
> immediate concern and not a matter

Right, that's one of the cases where we might just want to wait either 
until is is reported or until hugetlb is finally removed in a couple of 
decades ;)

-- 
Cheers

David / dhildenb