lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 27 Jun 2024 12:09:05 -0700
From: Nhat Pham <nphamcs@...il.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: syzbot <syzbot+b7f13b2d0cc156edf61a@...kaller.appspotmail.com>, 
	cgroups@...r.kernel.org, linux-kernel@...r.kernel.org, 
	lizefan.x@...edance.com, syzkaller-bugs@...glegroups.com, tj@...nel.org
Subject: Re: [syzbot] [cgroups?] BUG: sleeping function called from invalid
 context in cgroup_rstat_flush

On Thu, Jun 27, 2024 at 9:31 AM Johannes Weiner <hannes@...xchg.org> wrote:
>
> On Thu, Jun 27, 2024 at 07:03:21AM -0700, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:    7c16f0a4ed1c Merge tag 'i2c-for-6.10-rc5' of git://git.ker..
> > git tree:       upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=1511528e980000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=12f98862a3c0c799
> > dashboard link: https://syzkaller.appspot.com/bug?extid=b7f13b2d0cc156edf61a
> > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> >
> > Unfortunately, I don't have any reproducer for this issue yet.
> >
> > Downloadable assets:
> > disk image: https://storage.googleapis.com/syzbot-assets/50560e9024e5/disk-7c16f0a4.raw.xz
> > vmlinux: https://storage.googleapis.com/syzbot-assets/080c27daee72/vmlinux-7c16f0a4.xz
> > kernel image: https://storage.googleapis.com/syzbot-assets/c528e0da4544/bzImage-7c16f0a4.xz
> >
> > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > Reported-by: syzbot+b7f13b2d0cc156edf61a@...kaller.appspotmail.com
> >
> > BUG: sleeping function called from invalid context at kernel/cgroup/rstat.c:351
> > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 17332, name: syz-executor.4
> > preempt_count: 0, expected: 0
> > RCU nest depth: 1, expected: 0
> > 1 lock held by syz-executor.4/17332:
> >  #0: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:329 [inline]
> >  #0: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:781 [inline]
> >  #0: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: filemap_cachestat mm/filemap.c:4251 [inline]
> >  #0: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: __do_sys_cachestat mm/filemap.c:4407 [inline]
> >  #0: ffffffff8e333fa0 (rcu_read_lock){....}-{1:2}, at: __se_sys_cachestat+0x3ee/0xbb0 mm/filemap.c:4372
> > CPU: 1 PID: 17332 Comm: syz-executor.4 Not tainted 6.10.0-rc4-syzkaller-00330-g7c16f0a4ed1c #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
> > Call Trace:
> >  <TASK>
> >  __dump_stack lib/dump_stack.c:88 [inline]
> >  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
> >  __might_resched+0x5d4/0x780 kernel/sched/core.c:10196
> >  cgroup_rstat_flush+0x1e/0x50 kernel/cgroup/rstat.c:351
> >  workingset_test_recent+0x48a/0xa90 mm/workingset.c:473
> >  filemap_cachestat mm/filemap.c:4314 [inline]
> >  __do_sys_cachestat mm/filemap.c:4407 [inline]
> >  __se_sys_cachestat+0x795/0xbb0 mm/filemap.c:4372
> >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >  do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> >  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> Ok yeah, cachestat() holds the rcu read lock, so
> workingset_test_recent() can't do a sleepable rstat flush.
>
> I think the easiest fix would be to flush rstat from the root down
> (NULL) in filemap_cachestat(), before the rcu section, and add a flag
> to workingset_test_recent() to forego it. Nhat?

You're right. I think it's been broken since this commit:

b00684722262 mm: workingset: move the stats flush into workingset_test_recent()

which moves the stats flushing from the refault step (before rcu read
lock section) to inside workingset_test_recent(). I believe that's
6.8, 6.9, and 6.10 we need to fix?

The fix sounds reasonable to me :) Let me whip up something real quick.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ