[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a55beb72-4288-4356-9642-76ab35a2a07c@lucifer.local>
Date: Fri, 20 Jun 2025 12:50:43 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: syzbot <syzbot+a74a028d848147bc5931@...kaller.appspotmail.com>
Cc: akpm@...ux-foundation.org, cgroups@...r.kernel.org, hannes@...xchg.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhocko@...nel.org,
muchun.song@...ux.dev, roman.gushchin@...ux.dev,
shakeel.butt@...ux.dev, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [cgroups?] [mm?] WARNING in folio_lruvec_lock
OK I think this might well be me, apologies. I definitely see a suspicious
looking bug. TL;DR - will fix, it's not upstream yet.
Thanks to Andrew for forwarding to me, that's some insight there!
So it looks like in [0] we are doing the KSM flag update _before_ the mmap()
hook, mistakenly, which is... not good.
This results in the correct checks not being applied to the VMA, because
e.g. VM_HUGETLB will not be set until after the .mmap() hook has been completed
(I'm working on converting the hooks to .mmap_prepare() but we're not there
yet...)
[0]:https://lore.kernel.org/all/3ba660af716d87a18ca5b4e635f2101edeb56340.1748537921.git.lorenzo.stoakes@oracle.com/
I will send a fix there.
Thanks, Lorenzo
On Thu, Jun 19, 2025 at 05:02:31AM -0700, syzbot wrote:
> Hello,
>
> syzbot found the following issue on:
>
> HEAD commit: bc6e0ba6c9ba Add linux-next specific files for 20250613
> git tree: linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1126090c580000
> kernel config: https://syzkaller.appspot.com/x/.config?x=2f7a2e4d17ed458f
> dashboard link: https://syzkaller.appspot.com/bug?extid=a74a028d848147bc5931
> compiler: Debian clang version 20.1.6 (++20250514063057+1e4d39e07757-1~exp1~20250514183223.118), Debian LLD 20.1.6
>
> Unfortunately, I don't have any reproducer for this issue yet.
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/2430bb0465cc/disk-bc6e0ba6.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/436a39deef0a/vmlinux-bc6e0ba6.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/e314ca5b1eb3/bzImage-bc6e0ba6.xz
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+a74a028d848147bc5931@...kaller.appspotmail.com
>
> handle_mm_fault+0x740/0x8e0 mm/memory.c:6397
I mean this is:
ret = hugetlb_fault(vma->vm_mm, vma, address, flags);
Interestingly, I see in mem_cgroup_charge_hugetlb():
/*
* Even memcg does not account for hugetlb, we still want to update
* system-level stats via lruvec_stat_mod_folio. Return 0, and skip
* charging the memcg.
*/
if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
!memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys))
goto out;
if (charge_memcg(folio, memcg, gfp))
ret = -ENOMEM;
So maybe somehow KSM is touching hugetlb (it shouldn't do...) which has an
uncharged folio...?
This aligns with us having set KSM flags at the wrong time on a hugetlb mapping.
> faultin_page mm/gup.c:1186 [inline]
> __get_user_pages+0x1aef/0x30b0 mm/gup.c:1488
> populate_vma_page_range+0x29f/0x3a0 mm/gup.c:1922
> __mm_populate+0x24c/0x380 mm/gup.c:2025
> mm_populate include/linux/mm.h:3354 [inline]
> vm_mmap_pgoff+0x3f0/0x4c0 mm/util.c:584
> ksys_mmap_pgoff+0x587/0x760 mm/mmap.c:607
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> page_owner free stack trace missing
I'm guessing this is the process stack of the repro (even though syzkaller can't repro :P)
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 38 at ./include/linux/memcontrol.h:732 folio_lruvec include/linux/memcontrol.h:732 [inline]
This is:
static inline struct lruvec *folio_lruvec(struct folio *folio)
{
struct mem_cgroup *memcg = folio_memcg(folio);
VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled(), folio); <---- here
return mem_cgroup_lruvec(memcg, folio_pgdat(folio));
}
Meaning folio_memcg() is failing to find a memcg for the folio.
I'm not really that familiar with cgroup implementation but:
static inline struct mem_cgroup *folio_memcg(struct folio *folio)
{
if (folio_memcg_kmem(folio))
return obj_cgroup_memcg(__folio_objcg(folio));
return __folio_memcg(folio); <--- seems this is what is returning NULL?
}
I guess it's __folio_memcg() that's returning NULL as apparently
obj_cgroup_memcg() should always return something non-NULL.
And this is:
static inline struct mem_cgroup *__folio_memcg(struct folio *folio)
{
unsigned long memcg_data = folio->memcg_data;
...
return (struct mem_cgroup *)(memcg_data & ~OBJEXTS_FLAGS_MASK);
}
So if folio->memcg_data is NULL or NULL against the mask this will return NULL.
I see this is set to NULL (or rather 0) in mem_cgroup_migrate(), also in
__memcg_kmem_uncharge_page() (but is this kmem? No?), also uncharge_folio().
We also set the memcg after charge_memcg() -> commit_charge() so perhaps a
charge was expected that didn't happen somehow?
This again aligns with a mis-flagged hugetlb folio.
> WARNING: CPU: 0 PID: 38 at ./include/linux/memcontrol.h:732 folio_lruvec_lock+0x150/0x1a0 mm/memcontrol.c:1211
Ths is:
struct lruvec *folio_lruvec_lock(struct folio *folio)
{
struct lruvec *lruvec = folio_lruvec(folio); <---- here
spin_lock(&lruvec->lru_lock);
lruvec_memcg_debug(lruvec, folio);
return lruvec;
}
> Modules linked in:
> CPU: 0 UID: 0 PID: 38 Comm: ksmd Not tainted 6.16.0-rc1-next-20250613-syzkaller #0 PREEMPT(full)
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025
> RIP: 0010:folio_lruvec include/linux/memcontrol.h:732 [inline]
> RIP: 0010:folio_lruvec_lock+0x150/0x1a0 mm/memcontrol.c:1211
> Code: 7c 25 00 00 74 08 4c 89 ff e8 7c 66 f8 ff 4d 89 2f eb c4 48 89 df 48 c7 c6 60 4f 98 8b e8 58 9b dc ff c6 05 01 85 5f 0d 01 90 <0f> 0b 90 e9 d5 fe ff ff 44 89 f9 80 e1 07 80 c1 03 38 c1 0f 8c 4d
> RSP: 0018:ffffc90000ae7660 EFLAGS: 00010046
> RAX: b21d845e3554e000 RBX: ffffea0002108000 RCX: b21d845e3554e000
> RDX: 0000000000000002 RSI: ffffffff8db792e4 RDI: ffff88801de83c00
> RBP: ffffea0002108000 R08: 0000000000000003 R09: 0000000000000004
> R10: dffffc0000000000 R11: fffffbfff1bfaa14 R12: ffffea0002108000
> R13: ffffea0002108008 R14: 0000000000000000 R15: 0000000000000000
> FS: 0000000000000000(0000) GS:ffff888125c41000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f475c15ef98 CR3: 000000005f95a000 CR4: 00000000003526f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> __split_unmapped_folio+0x42e/0x2cb0 mm/huge_memory.c:3487
This is:
static int __split_unmapped_folio(struct folio *folio, int new_order,
struct page *split_at, struct page *lock_at,
struct list_head *list, pgoff_t end,
struct xa_state *xas, struct address_space *mapping,
bool uniform_split)
{
...
/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
lruvec = folio_lruvec_lock(folio); <--- here
...
}
So we're splitting an unmapped folio that is locked, non-LRU and frozen
(refcount == 0)
Interstingly, __split_folio_to_order() sets (new_)folio->memcg_data, but this is
called _after_ this folio_lurvec_lock().
> __folio_split+0xf78/0x1300 mm/huge_memory.c:3891
This is:
ret = __split_unmapped_folio(folio, new_order,
split_at, lock_at, list, end, &xas, mapping,
uniform_split);
> cmp_and_merge_page mm/ksm.c:2358 [inline]
So we have tried to merge two pages:
kfolio = try_to_merge_two_pages(rmap_item, page,
tree_rmap_item, tree_page);
But failed:
/*
* If both pages we tried to merge belong to the same compound
* page, then we actually ended up increasing the reference
* count of the same compound page twice, and split_huge_page
* failed.
* Here we set a flag if that happened, and we use it later to
* try split_huge_page again. Since we call put_page right
* afterwards, the reference count will be correct and
* split_huge_page should succeed.
*/
split = PageTransCompound(page)
&& compound_head(page) == compound_head(tree_page);
if (kfolio) {
...
} else if (split) {
/*
* We are here if we tried to merge two pages and
* failed because they both belonged to the same
* compound page. We will split the page now, but no
* merging will take place.
* We do not want to add the cost of a full lock; if
* the page is locked, it is better to skip it and
* perhaps try again later.
*/
if (!trylock_page(page))
return;
split_huge_page(page); <---- this is where the failure occurs.
unlock_page(page);
}
> ksm_do_scan+0x499b/0x6530 mm/ksm.c:2665
> ksm_scan_thread+0x10b/0x4b0 mm/ksm.c:2687
> kthread+0x711/0x8a0 kernel/kthread.c:464
> ret_from_fork+0x3f9/0x770 arch/x86/kernel/process.c:148
> ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
> </TASK>
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@...glegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
>
> If the report is already addressed, let syzbot know by replying with:
> #syz fix: exact-commit-title
>
> If you want to overwrite report's subsystems, reply with:
> #syz set subsystems: new-subsystem
> (See the list of subsystem names on the web dashboard)
>
> If the report is a duplicate of another one, reply with:
> #syz dup: exact-subject-of-another-report
>
> If you want to undo deduplication, reply with:
> #syz undup
>
>
Powered by blists - more mailing lists