linux-kernel - Re: 6.12 WARNING in netfs_consume_read

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKPOu+_OamJ-0wsJB3GOYu5v76ZwFr+N2L92dYH6NLBzzhDfOQ@mail.gmail.com>
Date: Fri, 6 Dec 2024 17:29:50 +0100
From: Max Kellermann <max.kellermann@...os.com>
To: David Howells <dhowells@...hat.com>
Cc: Jeff Layton <jlayton@...nel.org>, netfs@...ts.linux.dev, 
	linux-fsdevel <linux-fsdevel@...r.kernel.org>, linux-kernel@...r.kernel.org
Subject: Re: 6.12 WARNING in netfs_consume_read_data()

On Fri, Dec 6, 2024 at 4:08 PM Max Kellermann <max.kellermann@...os.com> wrote:
> Similar hangs wth 6.12.2 (vanilla without your "netfs-writeback" branch):

(Correction: this was 6.12.3, not 6.12.2)

I tried with 6.12.3 + dhowells/netfs-writeback; David's branch solved
many problems and it took much longer to trigger the hang, but it
eventually occurred:

 INFO: task bash:6599 blocked for more than 122 seconds.
       Not tainted 6.12.3-cm4all0-hp+ #298
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:bash            state:D stack:0     pid:6599  tgid:6599
ppid:6598   flags:0x00000006
 Call Trace:
  <TASK>
  __schedule+0xc34/0x4df0
  ? is_dynamic_key+0x120/0x150
  ? __pfx___schedule+0x10/0x10
  ? lock_release+0x206/0x660
  ? schedule+0x283/0x340
  ? __pfx_lock_release+0x10/0x10
  ? schedule+0x1e8/0x340
  schedule+0xdc/0x340
  schedule_preempt_disabled+0xa/0x10
  rwsem_down_read_slowpath+0x6ba/0xd00
  ? __pfx_rwsem_down_read_slowpath+0x10/0x10
  ? kernel_text_address+0xb8/0x150
  ? lock_acquire+0x11f/0x290
  ? ceph_start_io_read+0x19/0x80
  down_read+0xcd/0x220
  ? __pfx_down_read+0x10/0x10
  ? do_sys_openat2+0x106/0x160
  ? stack_trace_save+0x96/0xd0
  ? __pfx_stack_trace_save+0x10/0x10
  ceph_start_io_read+0x19/0x80
  ceph_read_iter+0x2e2/0xe70
  ? __pfx_ceph_read_iter+0x10/0x10
  ? psi_task_switch+0x256/0x810
  ? find_held_lock+0x2d/0x110
  ? lock_release+0x206/0x660
  ? finish_task_switch.isra.0+0x1db/0xa40
  vfs_read+0x6e1/0xc40
  ? lock_acquire+0x11f/0x290
  ? finish_task_switch.isra.0+0x129/0xa40
  ? __pfx_vfs_read+0x10/0x10
  ? finish_task_switch.isra.0+0x225/0xa40
  ? fdget_pos+0x1b3/0x540
  ? __pfx___seccomp_filter+0x10/0x10
  ksys_read+0xee/0x1c0
  ? __pfx_ksys_read+0x10/0x10
  ? lock_release+0x206/0x660
  ? __ceph_do_getattr+0xe8/0x380
  do_syscall_64+0x64/0x100
  ? fdget_raw+0x53/0x390
  ? __do_sys_newfstatat+0x86/0xd0
  ? __pfx___do_sys_newfstatat+0x10/0x10
  ? syscall_exit_to_user_mode+0x57/0x120
  ? do_syscall_64+0x70/0x100
  ? irqentry_exit_to_user_mode+0x3d/0x100
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f71fb04f21d
 RSP: 002b:00007ffdd516a918 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f71fb04f21d
 RDX: 0000000000003003 RSI: 00005597dc55d4d0 RDI: 0000000000000003
 RBP: 0000000000003003 R08: 00007f71fb12a020 R09: 00007f71fb12a020
 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000001f4
 R13: 00005597dc485340 R14: 00005597dc55d4d0 R15: 00005597be8d7524
  </TASK>
 INFO: task bash:6614 blocked for more than 122 seconds.
       Not tainted 6.12.3-cm4all0-hp+ #298
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 task:bash            state:D stack:0     pid:6614  tgid:6614
ppid:6613   flags:0x00000002
 Call Trace:
  <TASK>
  __schedule+0xc34/0x4df0
  ? __free_insn_slot+0x370/0x3d0
  ? __pfx___schedule+0x10/0x10
  ? lock_release+0x206/0x660
  ? schedule+0x283/0x340
  ? __pfx_lock_release+0x10/0x10
  ? schedule+0x1e8/0x340
  schedule+0xdc/0x340
  schedule_preempt_disabled+0xa/0x10
  rwsem_down_read_slowpath+0x6ba/0xd00
  ? __pfx_rwsem_down_read_slowpath+0x10/0x10
  ? kasan_save_stack+0x1c/0x40
  ? kasan_save_track+0x10/0x30
  ? lock_acquire+0x11f/0x290
  ? ceph_start_io_read+0x19/0x80
  ? find_held_lock+0x2d/0x110
  down_read+0xcd/0x220
  ? __ceph_caps_issued_mask+0x416/0xa10
  ? __pfx_down_read+0x10/0x10
  ceph_start_io_read+0x19/0x80
  ceph_read_iter+0x2e2/0xe70
  ? _copy_to_user+0x50/0x70
  ? __pfx_ceph_read_iter+0x10/0x10
  ? fdget_raw+0x53/0x390
  vfs_read+0x6e1/0xc40
  ? __do_sys_newfstatat+0x86/0xd0
  ? __pfx___do_sys_newfstatat+0x10/0x10
  ? __pfx_vfs_read+0x10/0x10
  ? fdget_pos+0x1b3/0x540
  ? __pfx___seccomp_filter+0x10/0x10
  ksys_read+0xee/0x1c0
  ? __pfx_ksys_read+0x10/0x10
  do_syscall_64+0x64/0x100
  ? do_user_addr_fault+0x401/0x8f0
  ? find_held_lock+0x59/0x110
  ? find_held_lock+0x2d/0x110
  ? lock_release+0x206/0x660
  ? do_user_addr_fault+0x45e/0x8f0
  ? __pfx_lock_release+0x10/0x10
  ? do_user_addr_fault+0x401/0x8f0
  ? do_user_addr_fault+0x463/0x8f0
  ? irqentry_exit_to_user_mode+0x3d/0x100
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7fa84a43a21d
 RSP: 002b:00007ffec8720278 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fa84a43a21d
 RDX: 0000000000003003 RSI: 0000565007f224d0 RDI: 0000000000000003
 RBP: 0000000000003003 R08: 00007fa84a515020 R09: 00007fa84a515020
 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000001f4
 R13: 0000565007e4a340 R14: 0000565007f224d0 R15: 0000565005ba8524
  </TASK>

 Showing all locks held in the system:
 1 lock held by khungtaskd/163:
  #0: ffffffffae629b80 (rcu_read_lock){....}-{1:2}, at:
debug_show_all_locks+0x64/0x280
 2 locks held by bash/3365:
  #0: ffff8881661803e0 (sb_writers#19){....}-{0:0}, at: ksys_write+0xee/0x1c0
  #1: ffff888192604b18 (&sb->s_type->i_mutex_key#19){....}-{3:3}, at:
ceph_start_io_write+0x15/0x30
 1 lock held by bash/6599:
  #0: ffff888192604b18 (&sb->s_type->i_mutex_key#19){....}-{3:3}, at:
ceph_start_io_read+0x19/0x80
 1 lock held by bash/6614:
  #0: ffff888192604b18 (&sb->s_type->i_mutex_key#19){....}-{3:3}, at:
ceph_start_io_read+0x19/0x80

 =============================================