linux-kernel - Re: [syzbot] WARNING in folio_lruvec_lock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YrPVZUXPD7L85mYp@castle>
Date:   Wed, 22 Jun 2022 19:52:21 -0700
From:   Roman Gushchin <roman.gushchin@...ux.dev>
To:     Muchun Song <songmuchun@...edance.com>
Cc:     syzbot <syzbot+ec972d37869318fc3ffb@...kaller.appspotmail.com>,
        akpm@...ux-foundation.org, cgroups@...r.kernel.org,
        hannes@...xchg.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, mhocko@...nel.org, shakeelb@...gle.com,
        syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] WARNING in folio_lruvec_lock_irqsave

On Wed, Jun 22, 2022 at 11:33:48PM +0800, Muchun Song wrote:
> On Wed, Jun 22, 2022 at 06:49:31AM -0700, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following issue on:
> > 
> > HEAD commit:    ac0ba5454ca8 Add linux-next specific files for 20220622
> > git tree:       linux-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=14354c18080000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=12809dacb9e7c5e0
> > dashboard link: https://syzkaller.appspot.com/bug?extid=ec972d37869318fc3ffb
> > compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > 
> > Unfortunately, I don't have any reproducer for this issue yet.
> > 
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+ec972d37869318fc3ffb@...kaller.appspotmail.com
> > 
> >  folio_put include/linux/mm.h:1227 [inline]
> >  put_page+0x217/0x280 include/linux/mm.h:1279
> >  unmap_and_move_huge_page mm/migrate.c:1343 [inline]
> >  migrate_pages+0x3dc3/0x5a10 mm/migrate.c:1440
> >  do_mbind mm/mempolicy.c:1332 [inline]
> >  kernel_mbind+0x4d7/0x7d0 mm/mempolicy.c:1479
> >  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> >  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
> >  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > page has been migrated, last migrate reason: mempolicy_mbind
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 18925 at include/linux/memcontrol.h:800 folio_lruvec include/linux/memcontrol.h:800 [inline]
> 
> The warning here is "VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled(), folio)",
> the memcg returned by folio_memcg() seems to be NULL which has 2 possibility, one is
> that objcg returned by folio_objcg() is NULL, another is that obj_cgroup_memcg(objcg)
> returns NULL. However, obj_cgroup_memcg() always returns a valid memcg. So Most likely
> objcg is NULL meaning this page is not charged to memcg. Is this possible for LRU pages?
> 
> I am not sure if this issue is caused by my commit cca700a8e695 ("mm: lru: use lruvec
> lock to serialize memcg changes") since I have removed folio_test_clear_lru() check
> from folio_batch_move_lru(). We know that a non-lru page may be not charged to memcg.
> But is it possible for a non-lru page to be passed to folio_batch_move_lru()? Seems
> impossible. Right? I am not very confident about this commit, hopefully, someone can
> review it.

How about to temporarily drop it?

I was about to suggest it anyway during the review: it's a standalone
optimization and the main patchset is already quite big and complex,
so probably easier to validate it separately.

Thanks!