lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <oluo6mk6zlb4wk6zul6hd3joasqjms3jwexxbcacewp537eenp@37gwchfwzddi>
Date: Sun, 13 Oct 2024 22:22:11 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Sasha Levin <sashal@...nel.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
        syzbot <syzbot+39bc767144c55c8db0ea@...kaller.appspotmail.com>,
        akpm@...ux-foundation.org, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, syzkaller-bugs@...glegroups.com, vbabka@...e.cz
Subject: Re: [syzbot] [mm?] INFO: task hung in exit_mmap

* Sasha Levin <sashal@...nel.org> [241013 09:29]:
> On Thu, Oct 10, 2024 at 04:28:18PM +0100, Lorenzo Stoakes wrote:
> > On Thu, Oct 10, 2024 at 08:19:28AM -0700, syzbot wrote:
> > > Hello,
> > > 
> > > syzbot found the following issue on:
> > > 
> > > HEAD commit:    d3d1556696c1 Merge tag 'mm-hotfixes-stable-2024-10-09-15-4..
> > > git tree:       upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=10416fd0580000
> > > kernel config:  https://syzkaller.appspot.com/x/.config?x=7a3fccdd0bb995
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=39bc767144c55c8db0ea
> > > compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
> > > 
> > > Unfortunately, I don't have any reproducer for this issue yet.
> > > 
> > > Downloadable assets:
> > > disk image: https://storage.googleapis.com/syzbot-assets/0600b551e610/disk-d3d15566.raw.xz
> > > vmlinux: https://storage.googleapis.com/syzbot-assets/d59d43ed3976/vmlinux-d3d15566.xz
> > > kernel image: https://storage.googleapis.com/syzbot-assets/e686a3e7e0d6/bzImage-d3d15566.xz
> > > 
> > > IMPORTANT: if you fix the issue, please add the following tag to the commit:
> > > Reported-by: syzbot+39bc767144c55c8db0ea@...kaller.appspotmail.com
> > > 
> > > INFO: task syz.3.917:7739 blocked for more than 146 seconds.
> > >       Not tainted 6.12.0-rc2-syzkaller-00074-gd3d1556696c1 #0
> > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > task:syz.3.917       state:D stack:23808 pid:7739  tgid:7739  ppid:5232   flags:0x00004000
> > > Call Trace:
> > >  <TASK>
> > >  context_switch kernel/sched/core.c:5322 [inline]
> > >  __schedule+0x1843/0x4ae0 kernel/sched/core.c:6682
> > >  __schedule_loop kernel/sched/core.c:6759 [inline]
> > >  schedule+0x14b/0x320 kernel/sched/core.c:6774
> > >  schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6831
> > >  rwsem_down_write_slowpath+0xeee/0x13b0 kernel/locking/rwsem.c:1176
> > >  __down_write_common kernel/locking/rwsem.c:1304 [inline]
> > >  __down_write kernel/locking/rwsem.c:1313 [inline]
> > >  down_write+0x1d7/0x220 kernel/locking/rwsem.c:1578
> > >  mmap_write_lock include/linux/mmap_lock.h:106 [inline]
> > >  exit_mmap+0x2bd/0xc40 mm/mmap.c:1872
> > 
> > Hmm, task freezing up or system becoming unstable/locked up is reminsecent
> > of the maple tree bug I fixed in [0], which is still in the unstable hotfix
> > branch.
> > 
> > This is likely not going to repro as it's quite heisenbug-ish to trigger
> > and the failures are like this - somewhat disconnected from the cause, so
> > not sure if there is any case to speed this to Linus's tree.
> > 
> > On the other hand it's a pretty serious problem for stability and likely to
> > continue to manifest in nasty ways like this.
> > 
> > Can't be 100% sure this is the cause, but seems likely.
> > 
> > [0]:https://lore.kernel.org/linux-mm/48b349a2a0f7c76e18772712d0997a5e12ab0a3b.1728314403.git.lorenzo.stoakes@oracle.com/
> 
> On my Debian build box, running a 6.1 kernel, I've started hitting a
> similar issue:
> 
> Oct 12 17:24:01 debian kernel: INFO: task sed:3557356 blocked for more than 1208 seconds.
> Oct 12 17:24:01 debian kernel:       Not tainted 6.1.0-26-amd64 #1 Debian 6.1.112-1
> Oct 12 17:24:01 debian kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Oct 12 17:24:01 debian kernel: task:sed             state:D stack:0     pid:3557356 ppid:1      flags:0x00000002
> Oct 12 17:24:01 debian kernel: Call Trace:
> Oct 12 17:24:01 debian kernel:  <TASK>
> Oct 12 17:24:01 debian kernel:  __schedule+0x34d/0x9e0
> Oct 12 17:24:01 debian kernel:  schedule+0x5a/0xd0
> Oct 12 17:24:01 debian kernel:  rwsem_down_write_slowpath+0x311/0x6d0
> Oct 12 17:24:01 debian kernel:  exit_mmap+0xf6/0x2f0
> Oct 12 17:24:01 debian kernel:  __mmput+0x3e/0x130
> Oct 12 17:24:01 debian kernel:  do_exit+0x2fc/0xaf0
> Oct 12 17:24:01 debian kernel:  do_group_exit+0x2d/0x80
> Oct 12 17:24:01 debian kernel:  __x64_sys_exit_group+0x14/0x20
> Oct 12 17:24:01 debian kernel:  do_syscall_64+0x55/0xb0
> Oct 12 17:24:01 debian kernel:  ? do_fault+0x1a4/0x410
> Oct 12 17:24:01 debian kernel:  ? __handle_mm_fault+0x660/0xfa0
> Oct 12 17:24:01 debian kernel:  ? exit_to_user_mode_prepare+0x40/0x1e0
> Oct 12 17:24:01 debian kernel:  ? handle_mm_fault+0xdb/0x2d0
> Oct 12 17:24:01 debian kernel:  ? do_user_addr_fault+0x1b0/0x550
> Oct 12 17:24:01 debian kernel:  ? exit_to_user_mode_prepare+0x40/0x1e0
> Oct 12 17:24:01 debian kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> Oct 12 17:24:01 debian kernel: RIP: 0033:0x7f797d75a349
> Oct 12 17:24:01 debian kernel: RSP: 002b:00007fff37f0d3c8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> Oct 12 17:24:01 debian kernel: RAX: ffffffffffffffda RBX: 00007f797d8549e0 RCX: 00007f797d75a349
> Oct 12 17:24:01 debian kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
> Oct 12 17:24:01 debian kernel: RBP: 0000000000000000 R08: fffffffffffffe98 R09: 00007fff37f0d2df
> Oct 12 17:24:01 debian kernel: R10: 00007fff37f0d240 R11: 0000000000000246 R12: 00007f797d8549e0
> Oct 12 17:24:01 debian kernel: R13: 00007f797d85a2e0 R14: 0000000000000002 R15: 00007f797d85a2c8
> Oct 12 17:24:01 debian kernel:  </TASK>
> 
> It reproduces fairly easily during a kernel build...
> 
> It doesn't sound like the same issue you're pointing out, right Lorenzo?

It could be.  I suspect there has been a change recently that has
made the bug possible - although, I've not put effort into finding out
yet if that is true.  If the bug existed for a long time (probably since
I fixed the live locking issue in 6.4 that was backported), then you
could be hitting it.

It is a single line fix.  If it happens frequently enough, you could try
it - this fix will go through the backporting route once it lands.

Although, I am not sure it has much to do with the maple tree as I don't
think anyone should have the mm to take the mmap write lock.  If we were
stuck in the maple tree somehow, the mm wouldn't reach the exit_mmap()
path - unless I missed something?

If you can dump the running tasks when you hit it, we could get a clue
from the (probably numerous) backtraces?

Thanks,
Liam

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ