linux-kernel - Re: [PATCH v1] mm/migrate_device: don't add folio to be freed to LRU in migrate_device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <n5nejzgqluxjdu7qe7ninf66yba3bzat5a4uimuk34jxhrvskk@hgbyhlnidmyy>
Date: Wed, 12 Feb 2025 09:33:42 +1100
From: Alistair Popple <apopple@...dia.com>
To: David Hildenbrand <david@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, 
	Andrew Morton <akpm@...ux-foundation.org>, Jérôme Glisse <jglisse@...hat.com>, 
	John Hubbard <jhubbard@...dia.com>
Subject: Re: [PATCH v1] mm/migrate_device: don't add folio to be freed to LRU
 in migrate_device_finalize()

On Tue, Feb 11, 2025 at 10:05:01AM +0100, David Hildenbrand wrote:
> On 11.02.25 06:23, Alistair Popple wrote:
> > On Mon, Feb 10, 2025 at 05:13:17PM +0100, David Hildenbrand wrote:
> > > If migration succeeded, we called
> > > folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from
> > > the old to the new folio. This will set memcg_data of the old folio to
> > > 0.
> > > 
> > > Similarly, if migration failed, memcg_data of the dst folio is left
> > > unset.
> > > 
> > > If we call folio_putback_lru() on such folios (memcg_data == 0), we will
> > > add the folio to be freed to the LRU, making memcg code unhappy. Running
> > > the hmm selftests:
> > > 
> > >    # ./hmm-tests
> > >    ...
> > >    #  RUN           hmm.hmm_device_private.migrate ...
> > >    [  102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00
> > >    [  102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff)
> > >    [  102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9
> > >    [  102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000
> > >    [  102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
> > >    [  102.087230][T14893] ------------[ cut here ]------------
> > >    [  102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170
> > >    [  102.090478][T14893] Modules linked in:
> > >    [  102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151
> > >    [  102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> > >    [  102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170
> > >    [  102.096104][T14893] Code: ...
> > >    [  102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293
> > >    [  102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426
> > >    [  102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880
> > >    [  102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
> > >    [  102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8
> > >    [  102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000
> > >    [  102.108830][T14893] FS:  00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
> > >    [  102.110643][T14893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >    [  102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0
> > >    [  102.113478][T14893] PKRU: 55555554
> > >    [  102.114172][T14893] Call Trace:
> > >    [  102.114805][T14893]  <TASK>
> > >    [  102.115397][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
> > >    [  102.116547][T14893]  ? __warn.cold+0x110/0x210
> > >    [  102.117461][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
> > >    [  102.118667][T14893]  ? report_bug+0x1b9/0x320
> > >    [  102.119571][T14893]  ? handle_bug+0x54/0x90
> > >    [  102.120494][T14893]  ? exc_invalid_op+0x17/0x50
> > >    [  102.121433][T14893]  ? asm_exc_invalid_op+0x1a/0x20
> > >    [  102.122435][T14893]  ? __wake_up_klogd.part.0+0x76/0xd0
> > >    [  102.123506][T14893]  ? dump_page+0x4f/0x60
> > >    [  102.124352][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
> > >    [  102.125500][T14893]  folio_batch_move_lru+0xd4/0x200
> > >    [  102.126577][T14893]  ? __pfx_lru_add+0x10/0x10
> > >    [  102.127505][T14893]  __folio_batch_add_and_move+0x391/0x720
> > >    [  102.128633][T14893]  ? __pfx_lru_add+0x10/0x10
> > >    [  102.129550][T14893]  folio_putback_lru+0x16/0x80
> > >    [  102.130564][T14893]  migrate_device_finalize+0x9b/0x530
> > >    [  102.131640][T14893]  dmirror_migrate_to_device.constprop.0+0x7c5/0xad0
> > >    [  102.133047][T14893]  dmirror_fops_unlocked_ioctl+0x89b/0xc80
> > > 
> > > Likely, nothing else goes wrong: putting the last folio reference will
> > > remove the folio from the LRU again. So besides memcg complaining,
> > > adding the folio to be freed to the LRU is just an unnecessary step.
> > 
> > Agreed - I had always wondered why we did that instead of just dropping the
> > reference but figured it was something to do with the LRU batching and never
> > looked too closely.
> > 
> > > The new flow resembles what we have in migrate_folio_move(): add the
> > > dst to the lru, remove migration ptes, unlock and unref dst.
> > > 
> > > Fixes: 8763cb45ab96 ("mm/migrate: new memory migration helper for use with device memory")
> > 
> > If this was introduced by the above I was trying to figure out why I hadn't
> > seen it, because whilst I don't religiously run hmm-tests and similar users
> > with CONFIG_DEBUG_VM I do run them often enough that I'd expect to have seen
> > the above. It turns out that prior to 85ce2c517ade ("memcontrol: only transfer
> > the memcg data for migration") you can't hit this, probably because pages were
> > double charged during migration so old->memcg_data remained set. So perhaps the
> > fixes tag should point at that, but maybe it was always wrong, I'm not familiar
> > enough with memcg to comment.
> 
> That would likely explain why we haven't sen it on the "migration succeeded"
> case when dropping src.
> 
> However, not so sure on the "migration failed" case, when we would drop dst.
> I would assume that the new folio (dst) would not be charged until we
> reached mem_cgroup_migrate() -- IOW, migration succeeded?

Hmm, good point. I don't think we actually have any good tests for migration
failed, and mostly it does succeed. So I guess I could believe I haven't hit
that path on a development kernel. We don't have any good test cases to force
migration failure, probably I should add one.

> Thanks for the review!
> 
> -- 
> Cheers,
> 
> David / dhildenb
>