[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1ffa1043-673b-fdd3-a33c-444c2e99fc54@huawei.com>
Date: Wed, 26 Jun 2024 15:56:54 +0800
From: Miaohe Lin <linmiaohe@...wei.com>
To: "Huang, Ying" <ying.huang@...el.com>, Zhenneng Li <lizhenneng@...inos.cn>
CC: Andrew Morton <akpm@...ux-foundation.org>, <linux-mm@...ck.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] migrate_pages: modify max number of pages to migrate in
batch
On 2024/6/25 9:17, Huang, Ying wrote:
> Hi, Zhenneng,
>
> Zhenneng Li <lizhenneng@...inos.cn> writes:
>
>> We restrict the number of pages to be migrated to no more than
>> HPAGE_PMD_NR or NR_MAX_BATCHED_MIGRATION, but in fact, the
>> number of pages to be migrated may reach 2*HPAGE_PMD_NR-1 or 2
>> *NR_MAX_BATCHED_MIGRATION-1, it's not in inconsistent with the context.
>
> Yes. It's not HPAGE_PMD_NR exactly.
>
>> Please refer to the patch: 42012e0436d4(migrate_pages: restrict number
>> of pages to migrate in batch)
>>
>> Signed-off-by: Zhenneng Li <lizhenneng@...inos.cn>
>> ---
>> mm/migrate.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 781979567f64..7a4b37aac9e8 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -1961,7 +1961,7 @@ int migrate_pages(struct list_head *from, new_folio_t get_new_folio,
>> break;
>> }
>> if (nr_pages >= NR_MAX_BATCHED_MIGRATION)
>> - list_cut_before(&folios, from, &folio2->lru);
>> + list_cut_before(&folios, from, &folio->lru);
>
> If the first entry of the list "from" is a THP with size HPAGE_PMD_NR,
> "folio" will be the first entry of from, so that "folios" will be empty.
> Right?
If "folios" list is empty, the "from" list is left unmodified. So we will reach "goto again" and
retry this ops again and again. This causes soft-lockup in my test env.
[ 635.267324] watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [bash:1031]
[ 635.277957] Kernel panic - not syncing: softlockup: hung tasks
[ 635.279519] CPU: 8 PID: 1031 Comm: bash Tainted: G L 6.10.0-rc5-00327-g47fd2329f867 #45
[ 635.279796] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 635.279796] Call Trace:
[ 635.279796] <IRQ>
[ 635.279796] panic+0x326/0x350
[ 635.280827] watchdog_timer_fn+0x226/0x270
[ 635.280827] ? __pfx_watchdog_timer_fn+0x10/0x10
[ 635.280827] __hrtimer_run_queues+0x1a8/0x360
[ 635.280827] hrtimer_interrupt+0xfe/0x240
[ 635.280827] __sysvec_apic_timer_interrupt+0x71/0x1f0
[ 635.280827] sysvec_apic_timer_interrupt+0x6a/0x80
[ 635.280827] </IRQ>
[ 635.280827] <TASK>
[ 635.282706] asm_sysvec_apic_timer_interrupt+0x1a/0x20
[ 635.282928] RIP: 0010:migrate_pages_batch+0x780/0xd50
[ 635.282928] Code: 6c 24 28 41 89 46 0c 48 8b 44 24 70 03 54 24 04 c7 44 24 58 f4 ff ff ff 41 01 56 04 48 39 c8 0f 84 d0 03 00 00 e8 d0 5c fa ff <c7> 44 24 54 00 00 00 00 48 8b 94 24 80 00 00 00 48 8b 02 48 8d 6a
[ 635.282928] RSP: 0018:ffffa76f88e57a00 EFLAGS: 00000246
[ 635.282928] RAX: 0000000000000000 RBX: ffffa76f88e57c50 RCX: ffffa76f88e57c60
[ 635.284331] RDX: ffffa76f88e57b78 RSI: ffffffffa7d1eb10 RDI: ffffa76f88e57b78
[ 635.284331] RBP: ffffdbcfc5000000 R08: 0000000000000000 R09: 0000000000000002
[ 635.284331] R10: 0000000000000007 R11: 0000000000000000 R12: ffffa76f88e57b70
[ 635.285800] R13: ffffa76f88e57ba8 R14: ffffa76f88e57bd0 R15: ffffa76f88e57b70
[ 635.285800] ? __pfx_alloc_migration_target+0x10/0x10
[ 635.285800] ? __pfx_alloc_migration_target+0x10/0x10
[ 635.286824] migrate_pages+0xaa7/0xd70
[ 635.286824] ? __pfx_alloc_migration_target+0x10/0x10
[ 635.287433] ? folio_lruvec_lock_irq+0x6a/0xf0
[ 635.287433] ? folio_isolate_lru+0x198/0x2a0
[ 635.287433] do_migrate_range+0x194/0x740
[ 635.287433] offline_pages+0x4ab/0x6e0
[ 635.288614] memory_subsys_offline+0x9e/0x1d0
[ 635.288803] ? _raw_spin_unlock_irqrestore+0x2c/0x50
[ 635.289119] device_offline+0xe2/0x110
[ 635.289119] state_store+0x6d/0xc0
[ 635.289119] kernfs_fop_write_iter+0x12c/0x1d0
[ 635.290093] vfs_write+0x380/0x540
[ 635.290093] ksys_write+0x64/0xe0
[ 635.290551] do_syscall_64+0xb9/0x1d0
[ 635.290551] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 635.290551] RIP: 0033:0x7f99f5514887
[ 635.290551] Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 635.291480] RSP: 002b:00007ffe05712888 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 635.291480] RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f99f5514887
[ 635.292980] RDX: 0000000000000008 RSI: 000055c97b961e10 RDI: 0000000000000001
[ 635.292980] RBP: 000055c97b961e10 R08: 00007f99f55d1460 R09: 000000007fffffff
[ 635.292980] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000008
[ 635.292980] R13: 00007f99f561b780 R14: 00007f99f5617600 R15: 00007f99f5616a00
[ 635.294155] </TASK>
[ 635.294155] Kernel Offset: 0x26a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 635.294155] ---[ end Kernel panic - not syncing: softlockup: hung tasks ]---
Thanks.
.
Powered by blists - more mailing lists