lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Thu, 29 Dec 2022 22:01:42 +0000
From:   Matthew Wilcox <willy@...radead.org>
To:     syzbot <syzbot+b7ad168b779385f8cd58@...kaller.appspotmail.com>
Cc:     akpm@...ux-foundation.org, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] INFO: task hung in __unmap_and_move (4)

On Thu, Dec 29, 2022 at 01:48:42AM -0800, syzbot wrote:
> INFO: task kcompactd1:32 blocked for more than 143 seconds.
>       Not tainted 6.1.0-syzkaller-14594-g72a85e2b0a1e #0
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kcompactd1      state:D stack:26360 pid:32    ppid:2      flags:0x00004000
> Call Trace:
>  <TASK>
>  context_switch kernel/sched/core.c:5244 [inline]
>  __schedule+0x9d1/0xe40 kernel/sched/core.c:6555
>  schedule+0xcb/0x190 kernel/sched/core.c:6631
>  io_schedule+0x83/0x100 kernel/sched/core.c:8811
>  folio_wait_bit_common+0x8ca/0x1390 mm/filemap.c:1297
>  folio_lock include/linux/pagemap.h:938 [inline]
>  __unmap_and_move+0x835/0x12a0 mm/migrate.c:1040
>  unmap_and_move+0x28f/0xd80 mm/migrate.c:1194
>  migrate_pages+0x50f/0x14d0 mm/migrate.c:1477
>  compact_zone+0x2893/0x37a0 mm/compaction.c:2413
>  proactive_compact_node mm/compaction.c:2665 [inline]
>  kcompactd+0x1b46/0x2750 mm/compaction.c:2975

OK, so kcompactd is trying to compact a zone, has called folio_lock()
and whoever has the folio locked has had it locked for 143 seconds.
That seems like quite a long time.  Probably it is locked waiting
for I/O.

> NMI backtrace for cpu 1
[...]
>  lock_release+0x81/0x870 kernel/locking/lockdep.c:5679
>  rcu_read_unlock include/linux/rcupdate.h:797 [inline]
>  folio_evictable+0x1df/0x2d0 mm/internal.h:140
>  move_folios_to_lru+0x324/0x25c0 mm/vmscan.c:2413
>  shrink_inactive_list+0x60b/0xca0 mm/vmscan.c:2529
>  shrink_list mm/vmscan.c:2767 [inline]
>  shrink_lruvec+0x449/0xc50 mm/vmscan.c:5951
>  shrink_node_memcgs+0x35c/0x780 mm/vmscan.c:6138
>  shrink_node+0x299/0x1050 mm/vmscan.c:6169
>  shrink_zones+0x4fb/0xc40 mm/vmscan.c:6407
>  do_try_to_free_pages+0x215/0xcd0 mm/vmscan.c:6469
>  try_to_free_pages+0x3e8/0xc60 mm/vmscan.c:6704
>  __perform_reclaim mm/page_alloc.c:4750 [inline]
>  __alloc_pages_direct_reclaim mm/page_alloc.c:4772 [inline]
>  __alloc_pages_slowpath+0xd5c/0x2120 mm/page_alloc.c:5178
>  __alloc_pages+0x3d4/0x560 mm/page_alloc.c:5562
>  folio_alloc+0x1a/0x50 mm/mempolicy.c:2296
>  filemap_alloc_folio+0xca/0x2c0 mm/filemap.c:972
>  page_cache_ra_unbounded+0x212/0x820 mm/readahead.c:248
>  do_sync_mmap_readahead+0x786/0x950 mm/filemap.c:3062
>  filemap_fault+0x38d/0x1060 mm/filemap.c:3154

So dhcpd has taken a page fault, missed in the page cache, called
readahead, is presumably partway through the readahead (ie has folios
locked in the page cache, not uptodate and I/O hasn't been submitted
on them).  It's trying to allocate pages, but has fallen into reclaim.
It's trying to shrink the inactive list at this point, but is not
having much luck.  For one thing, it's a GFP_NOFS allocation.  So
it was probably the one who woke kcompactd.

Should readahead be trying less hard to allocate memory?  It's already
using __GFP_NORETRY.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ