linux-kernel - Re: [PATCH v1] mm/migrate_device: don't add folio to be freed to LRU in migrate_device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a38f547a-355f-4050-83bf-ddfb4af5f71b@redhat.com>
Date: Tue, 11 Feb 2025 10:05:01 +0100
From: David Hildenbrand <david@...hat.com>
To: Alistair Popple <apopple@...dia.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 Andrew Morton <akpm@...ux-foundation.org>, Jérôme Glisse
 <jglisse@...hat.com>, John Hubbard <jhubbard@...dia.com>
Subject: Re: [PATCH v1] mm/migrate_device: don't add folio to be freed to LRU
 in migrate_device_finalize()

On 11.02.25 06:23, Alistair Popple wrote:
> On Mon, Feb 10, 2025 at 05:13:17PM +0100, David Hildenbrand wrote:
>> If migration succeeded, we called
>> folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from
>> the old to the new folio. This will set memcg_data of the old folio to
>> 0.
>>
>> Similarly, if migration failed, memcg_data of the dst folio is left
>> unset.
>>
>> If we call folio_putback_lru() on such folios (memcg_data == 0), we will
>> add the folio to be freed to the LRU, making memcg code unhappy. Running
>> the hmm selftests:
>>
>>    # ./hmm-tests
>>    ...
>>    #  RUN           hmm.hmm_device_private.migrate ...
>>    [  102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00
>>    [  102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff)
>>    [  102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9
>>    [  102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000
>>    [  102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
>>    [  102.087230][T14893] ------------[ cut here ]------------
>>    [  102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170
>>    [  102.090478][T14893] Modules linked in:
>>    [  102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151
>>    [  102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
>>    [  102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170
>>    [  102.096104][T14893] Code: ...
>>    [  102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293
>>    [  102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426
>>    [  102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880
>>    [  102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
>>    [  102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8
>>    [  102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000
>>    [  102.108830][T14893] FS:  00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
>>    [  102.110643][T14893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>    [  102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0
>>    [  102.113478][T14893] PKRU: 55555554
>>    [  102.114172][T14893] Call Trace:
>>    [  102.114805][T14893]  <TASK>
>>    [  102.115397][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
>>    [  102.116547][T14893]  ? __warn.cold+0x110/0x210
>>    [  102.117461][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
>>    [  102.118667][T14893]  ? report_bug+0x1b9/0x320
>>    [  102.119571][T14893]  ? handle_bug+0x54/0x90
>>    [  102.120494][T14893]  ? exc_invalid_op+0x17/0x50
>>    [  102.121433][T14893]  ? asm_exc_invalid_op+0x1a/0x20
>>    [  102.122435][T14893]  ? __wake_up_klogd.part.0+0x76/0xd0
>>    [  102.123506][T14893]  ? dump_page+0x4f/0x60
>>    [  102.124352][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
>>    [  102.125500][T14893]  folio_batch_move_lru+0xd4/0x200
>>    [  102.126577][T14893]  ? __pfx_lru_add+0x10/0x10
>>    [  102.127505][T14893]  __folio_batch_add_and_move+0x391/0x720
>>    [  102.128633][T14893]  ? __pfx_lru_add+0x10/0x10
>>    [  102.129550][T14893]  folio_putback_lru+0x16/0x80
>>    [  102.130564][T14893]  migrate_device_finalize+0x9b/0x530
>>    [  102.131640][T14893]  dmirror_migrate_to_device.constprop.0+0x7c5/0xad0
>>    [  102.133047][T14893]  dmirror_fops_unlocked_ioctl+0x89b/0xc80
>>
>> Likely, nothing else goes wrong: putting the last folio reference will
>> remove the folio from the LRU again. So besides memcg complaining,
>> adding the folio to be freed to the LRU is just an unnecessary step.
> 
> Agreed - I had always wondered why we did that instead of just dropping the
> reference but figured it was something to do with the LRU batching and never
> looked too closely.
> 
>> The new flow resembles what we have in migrate_folio_move(): add the
>> dst to the lru, remove migration ptes, unlock and unref dst.
>>
>> Fixes: 8763cb45ab96 ("mm/migrate: new memory migration helper for use with device memory")
> 
> If this was introduced by the above I was trying to figure out why I hadn't
> seen it, because whilst I don't religiously run hmm-tests and similar users
> with CONFIG_DEBUG_VM I do run them often enough that I'd expect to have seen
> the above. It turns out that prior to 85ce2c517ade ("memcontrol: only transfer
> the memcg data for migration") you can't hit this, probably because pages were
> double charged during migration so old->memcg_data remained set. So perhaps the
> fixes tag should point at that, but maybe it was always wrong, I'm not familiar
> enough with memcg to comment.

That would likely explain why we haven't sen it on the "migration 
succeeded" case when dropping src.

However, not so sure on the "migration failed" case, when we would drop 
dst. I would assume that the new folio (dst) would not be charged until 
we reached mem_cgroup_migrate() -- IOW, migration succeeded?

Thanks for the review!

-- 
Cheers,

David / dhildenb