lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <0c833afd-64d5-4128-a03a-c47ff834b7ab@linux.dev>
Date: Tue, 14 Oct 2025 14:49:27 +0800
From: Qi Zheng <qi.zheng@...ux.dev>
To: Zi Yan <ziy@...dia.com>
Cc: hannes@...xchg.org, hughd@...gle.com, mhocko@...e.com,
 roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, muchun.song@...ux.dev,
 david@...hat.com, lorenzo.stoakes@...cle.com, harry.yoo@...cle.com,
 baolin.wang@...ux.alibaba.com, Liam.Howlett@...cle.com, npache@...hat.com,
 ryan.roberts@....com, dev.jain@....com, baohua@...nel.org,
 lance.yang@...ux.dev, akpm@...ux-foundation.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
 Qi Zheng <zhengqi.arch@...edance.com>
Subject: Re: [PATCH v4 0/4] reparent the THP split queue

Hi Zi,

On 10/14/25 12:37 AM, Zi Yan wrote:
> On 13 Oct 2025, at 3:23, Qi Zheng wrote:
> 

[snip]

>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index b5eea2091cdf6..5353c7bd2c9af 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -4286,8 +4286,10 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>>          }
>>          folios_put(&fbatch);
>>
>> -       if (sc->nr_to_scan)
>> +       if (sc->nr_to_scan) {
>> +               cond_resched();
>>                  goto retry;
>> +       }
>>
>>          /*
>>           * Stop shrinker if we didn't split any page, but the queue is empty.
>>
> 
> It does not fix the issue, but only gets rid of the soft lockup warning.
> "echo 3 | sudo tee /proc/sys/vm/drop_caches" just runs forever.

Oh, my bad, I didn't notice that.

> 
> Looking at the original code, sc->nr_to_scan was one of the two conditions
> on breaking out of split_queue scanning and was never checked again
> afterwards. When split_queue size is smaller than nr_to_scan, your code
> will retry forever but not the original one. After I added pr_info() to
> print sc->nr_to_scan at
> 1) before retry:,
> 2) before for (... folio_batch_count();...),
> 3) before "if (sc->nr_to_scan)",
> 
> I see that 1) printed 2, 2) and 3) kept printing 1. It matches my
> above guess.

Got it.

> 
> The below patch fixes the issue:
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 43a3c499aec0..d38816a0c117 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -4415,7 +4415,7 @@ static unsigned long deferred_split_scan(struct shrinker *shrink,
>   	}
>   	folios_put(&fbatch);
> 
> -	if (sc->nr_to_scan)
> +	if (sc->nr_to_scan && !list_empty(&ds_queue->split_queue))
>   		goto retry;
> 
>   	/*
> 

Thanks! After applying this locally, I no longer see softlockup and
no longer see deferred_split_scan() in perf hotspots.

Will do this in the next version.

Thanks,
Qi

> 
> 
>>
>>> [   36.441592] Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 53 48 89 f3 e8 92 68 fd fe 80 e7 02 74 06 fb 0f 1f 44 00 00 <65> ff 0d d0 5f 7e 01 74 06 5b c3 cc cc cc cc 0f 1f 44 00 00 5b c3
>>> [   36.441594] RSP: 0018:ffffc900029afb60 EFLAGS: 00000202
>>> [   36.441598] RAX: 0000000000000001 RBX: 0000000000000286 RCX: ffff888101168670
>>> [   36.441601] RDX: 0000000000000001 RSI: 0000000000000286 RDI: ffff888101168658
>>> [   36.441602] RBP: 0000000000000001 R08: ffff88813ba44ec0 R09: 0000000000000000
>>> [   36.441603] R10: 00000000000001a8 R11: 0000000000000000 R12: ffff8881011685e0
>>> [   36.441604] R13: 0000000000000000 R14: ffff888101168000 R15: ffffc900029afd60
>>> [   36.441606] FS:  00007f7fe3655740(0000) GS:ffff8881b7e5d000(0000) knlGS:0000000000000000
>>> [   36.441607] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   36.441608] CR2: 0000563d4d439bf0 CR3: 000000010873c006 CR4: 0000000000370ef0
>>> [   36.441614] Call Trace:
>>> [   36.441616]  <TASK>
>>> [   36.441619]  deferred_split_scan+0x1e0/0x480
>>> [   36.441627]  ? _raw_spin_unlock_irqrestore+0xe/0x40
>>> [   36.441630]  ? kvfree_rcu_queue_batch+0x96/0x1c0
>>> [   36.441634]  ? do_raw_spin_unlock+0x46/0xd0
>>> [   36.441639]  ? kfree_rcu_monitor+0x1da/0x2c0
>>> [   36.441641]  ? list_lru_count_one+0x47/0x90
>>> [   36.441644]  do_shrink_slab+0x153/0x360
>>> [   36.441649]  shrink_slab+0xd3/0x390
>>> [   36.441652]  drop_slab+0x7d/0x130
>>> [   36.441655]  drop_caches_sysctl_handler+0x98/0xb0
>>> [   36.441660]  proc_sys_call_handler+0x1c7/0x2c0
>>> [   36.441664]  vfs_write+0x221/0x450
>>> [   36.441669]  ksys_write+0x6c/0xe0
>>> [   36.441672]  do_syscall_64+0x50/0x200
>>> [   36.441675]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>> [   36.441678] RIP: 0033:0x7f7fe36e7687
>>> [   36.441685] Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
>>> [   36.441686] RSP: 002b:00007ffdffcbba10 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>>> [   36.441688] RAX: ffffffffffffffda RBX: 00007f7fe3655740 RCX: 00007f7fe36e7687
>>> [   36.441689] RDX: 0000000000000002 RSI: 00007ffdffcbbbb0 RDI: 0000000000000003
>>> [   36.441690] RBP: 00007ffdffcbbbb0 R08: 0000000000000000 R09: 0000000000000000
>>> [   36.441691] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
>>> [   36.441692] R13: 0000558d40be64c0 R14: 00007f7fe383de80 R15: 0000000000000002
>>> [   36.441694]  </TASK>
>>> [   64.441531] watchdog: BUG: soft lockup - CPU#0 stuck for 53s! [tee:810]
>>> [   64.441537] Modules linked in:
>>> [   64.441545] CPU: 0 UID: 0 PID: 810 Comm: tee Tainted: G             L      6.17.0-mm-everything-2024-01-29-07-19-no-mglru+ #526 PREEMPT(voluntary)
>>> [   64.441548] Tainted: [L]=SOFTLOCKUP
>>> [   64.441552] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
>>> [   64.441555] RIP: 0010:_raw_spin_unlock_irqrestore+0x19/0x40
>>> [   64.441565] Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 53 48 89 f3 e8 92 68 fd fe 80 e7 02 74 06 fb 0f 1f 44 00 00 <65> ff 0d d0 5f 7e 01 74 06 5b c3 cc cc cc cc 0f 1f 44 00 00 5b c3
>>> [   64.441566] RSP: 0018:ffffc900029afb60 EFLAGS: 00000202
>>> [   64.441568] RAX: 0000000000000001 RBX: 0000000000000286 RCX: ffff888101168670
>>> [   64.441570] RDX: 0000000000000001 RSI: 0000000000000286 RDI: ffff888101168658
>>> [   64.441571] RBP: 0000000000000001 R08: ffff88813ba44ec0 R09: 0000000000000000
>>> [   64.441572] R10: 00000000000001a8 R11: 0000000000000000 R12: ffff8881011685e0
>>> [   64.441573] R13: 0000000000000000 R14: ffff888101168000 R15: ffffc900029afd60
>>> [   64.441574] FS:  00007f7fe3655740(0000) GS:ffff8881b7e5d000(0000) knlGS:0000000000000000
>>> [   64.441576] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [   64.441577] CR2: 0000563d4d439bf0 CR3: 000000010873c006 CR4: 0000000000370ef0
>>> [   64.441581] Call Trace:
>>> [   64.441583]  <TASK>
>>> [   64.441591]  deferred_split_scan+0x1e0/0x480
>>> [   64.441598]  ? _raw_spin_unlock_irqrestore+0xe/0x40
>>> [   64.441599]  ? kvfree_rcu_queue_batch+0x96/0x1c0
>>> [   64.441603]  ? do_raw_spin_unlock+0x46/0xd0
>>> [   64.441607]  ? kfree_rcu_monitor+0x1da/0x2c0
>>> [   64.441610]  ? list_lru_count_one+0x47/0x90
>>> [   64.441613]  do_shrink_slab+0x153/0x360
>>> [   64.441618]  shrink_slab+0xd3/0x390
>>> [   64.441621]  drop_slab+0x7d/0x130
>>> [   64.441624]  drop_caches_sysctl_handler+0x98/0xb0
>>> [   64.441629]  proc_sys_call_handler+0x1c7/0x2c0
>>> [   64.441632]  vfs_write+0x221/0x450
>>> [   64.441638]  ksys_write+0x6c/0xe0
>>> [   64.441641]  do_syscall_64+0x50/0x200
>>> [   64.441645]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>>> [   64.441648] RIP: 0033:0x7f7fe36e7687
>>> [   64.441654] Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
>>> [   64.441656] RSP: 002b:00007ffdffcbba10 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>>> [   64.441658] RAX: ffffffffffffffda RBX: 00007f7fe3655740 RCX: 00007f7fe36e7687
>>> [   64.441659] RDX: 0000000000000002 RSI: 00007ffdffcbbbb0 RDI: 0000000000000003
>>> [   64.441660] RBP: 00007ffdffcbbbb0 R08: 0000000000000000 R09: 0000000000000000
>>> [   64.441661] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000002
>>> [   64.441662] R13: 0000558d40be64c0 R14: 00007f7fe383de80 R15: 0000000000000002
>>> [   64.441663]  </TASK>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Yan, Zi
> 
> 
> --
> Best Regards,
> Yan, Zi


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ