linux-kernel - Re: INFO: task hung in blkdev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAehj2my7rWCNRBNg=8WoADNQFDo7rDcVUhaJutoX73HU55HZw@mail.gmail.com>
Date:   Fri, 19 May 2023 15:12:39 +0800
From:   yang lan <lanyang0908@...il.com>
To:     Yu Kuai <yukuai1@...weicloud.com>
Cc:     Matthew Wilcox <willy@...radead.org>, viro@...iv.linux.org.uk,
        linux-kernel@...r.kernel.org, josef@...icpanda.com,
        linux-block@...r.kernel.org, nbd@...er.debian.org,
        syzkaller-bugs@...glegroups.com, linux-fsdevel@...r.kernel.org,
        axboe@...nel.dk, haris.iqbal@...os.com, jinpu.wang@...os.com,
        brauner@...nel.org, "yukuai (C)" <yukuai3@...wei.com>
Subject: Re: INFO: task hung in blkdev_open bug

Hi,

./rqos/wbt/wb_background:4
./rqos/wbt/wb_normal:8
./rqos/wbt/unknown_cnt:0
./rqos/wbt/min_lat_nsec:2000000
./rqos/wbt/inflight:0: inflight 0
./rqos/wbt/inflight:1: inflight 0
./rqos/wbt/inflight:2: inflight 0
./rqos/wbt/id:0
./rqos/wbt/enabled:1
./rqos/wbt/curr_win_nsec:0
./hctx0/type:default
./hctx0/dispatch_busy:0
./hctx0/active:0
./hctx0/run:1
./hctx0/sched_tags_bitmap:00000000: 0100 0000 0000 0000 0000 0000 0000 0000
./hctx0/sched_tags_bitmap:00000010: 0000 0000 0000 0000 0000 0000 0000 0000
./hctx0/sched_tags:nr_tags=256
./hctx0/sched_tags:nr_reserved_tags=0
./hctx0/sched_tags:active_queues=0
./hctx0/sched_tags:bitmap_tags:
./hctx0/sched_tags:depth=256
./hctx0/sched_tags:busy=1
./hctx0/sched_tags:cleared=0
./hctx0/sched_tags:bits_per_word=64
./hctx0/sched_tags:map_nr=4
./hctx0/sched_tags:alloc_hint={245, 45}
./hctx0/sched_tags:wake_batch=8
./hctx0/sched_tags:wake_index=0
./hctx0/sched_tags:ws_active=0
./hctx0/sched_tags:ws={
./hctx0/sched_tags:     {.wait=inactive},
./hctx0/sched_tags:     {.wait=inactive},
./hctx0/sched_tags:     {.wait=inactive},
./hctx0/sched_tags:     {.wait=inactive},
./hctx0/sched_tags:     {.wait=inactive},
./hctx0/sched_tags:     {.wait=inactive},
./hctx0/sched_tags:     {.wait=inactive},
./hctx0/sched_tags:     {.wait=inactive},
./hctx0/sched_tags:}
./hctx0/sched_tags:round_robin=0
./hctx0/sched_tags:min_shallow_depth=192
./hctx0/tags_bitmap:00000000: 0000 0000 0100 0000 0000 0000 0000 0000
./hctx0/tags:nr_tags=128
./hctx0/tags:nr_reserved_tags=0
./hctx0/tags:active_queues=0
./hctx0/tags:bitmap_tags:
./hctx0/tags:depth=128
./hctx0/tags:busy=1
./hctx0/tags:cleared=0
./hctx0/tags:bits_per_word=32
./hctx0/tags:map_nr=4
./hctx0/tags:alloc_hint={123, 51}
./hctx0/tags:wake_batch=8
./hctx0/tags:wake_index=0
./hctx0/tags:ws_active=0
./hctx0/tags:ws={
./hctx0/tags:   {.wait=inactive},
./hctx0/tags:   {.wait=inactive},
./hctx0/tags:   {.wait=inactive},
./hctx0/tags:   {.wait=inactive},
./hctx0/tags:   {.wait=inactive},
./hctx0/tags:   {.wait=inactive},
./hctx0/tags:   {.wait=inactive},
./hctx0/tags:   {.wait=inactive},
./hctx0/tags:}
./hctx0/tags:round_robin=0
./hctx0/tags:min_shallow_depth=4294967295
./hctx0/ctx_map:00000000: 00
./hctx0/busy:ffff888016860000 {.op=READ, .cmd_flags=,
.rq_flags=STARTED|ELVPRIV|IO_STAT|STATS|ELV, .state=in_flight,
.tag=32, .internal_tag=0}
./hctx0/flags:alloc_policy=FIFO SHOULD_MERGE|BLOCKING
./sched/queued:0 1 0
./sched/owned_by_driver:0 1 0
./sched/async_depth:192
./sched/starved:0
./sched/batching:1
./state:SAME_COMP|NONROT|IO_STAT|INIT_DONE|STATS|REGISTERED|NOWAIT|30
./pm_only:0

So how can we know where the io is?

Regards,

Yang

Yu Kuai <yukuai1@...weicloud.com> 于2023年5月18日周四 11:30写道：
>
> Hi,
>
> 在 2023/05/18 0:27, yang lan 写道:
> > Hi,
> >
> > Thank you for your response.
> >
> >> Does this reproduce on current kernels, eg 6.4-rc2?
> >
> > Yeah, it can be reproduced on kernel 6.4-rc2.
> >
>
> Below log shows that io hang, can you collect following debugfs so
> that we can know where is the io now.
>
> cd /sys/kernel/debug/block/[test_device] && find . -type f -exec grep
> -aH . {} \;
>
> Thanks,
> Kuai
> > root@...kaller:~# uname -a
> > Linux syzkaller 6.4.0-rc2 #1 SMP PREEMPT_DYNAMIC Wed May 17 22:58:52
> > CST 2023 x86_64 GNU/Linux
> > root@...kaller:~# gcc poc_blkdev.c -o poc_blkdev
> > root@...kaller:~# ./poc_blkdev
> > [  128.718051][ T7121] nbd0: detected capacity change from 0 to 4
> > [  158.917678][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 30 seconds
> > [  188.997677][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 60 seconds
> > [  219.077191][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 90 seconds
> > [  249.157312][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 120 seconds
> > [  279.237409][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 150 seconds
> > [  309.317843][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 180 seconds
> > [  339.397950][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 210 seconds
> > [  369.478031][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 240 seconds
> > [  399.558253][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 270 seconds
> > [  429.638372][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 300 seconds
> > [  459.718454][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 330 seconds
> > [  489.798571][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 360 seconds
> > [  519.878643][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 390 seconds
> > [  549.958966][  T998] block nbd0: Possible stuck request
> > ffff888016f08000: control (read@0,2048B). Runtime 420 seconds
> > [  571.719145][   T30] INFO: task systemd-udevd:7123 blocked for more
> > than 143 seconds.
> > [  571.719652][   T30]       Not tainted 6.4.0-rc2 #1
> > [  571.719900][   T30] "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  571.720307][   T30] task:systemd-udevd   state:D stack:26224
> > pid:7123  ppid:3998   flags:0x00004004
> > [  571.720756][   T30] Call Trace:
> > [  571.720923][   T30]  <TASK>
> > [  571.721073][   T30]  __schedule+0x9ca/0x2630
> > [  571.721348][   T30]  ? firmware_map_remove+0x1e0/0x1e0
> > [  571.721618][   T30]  ? find_held_lock+0x33/0x1c0
> > [  571.721866][   T30]  ? lock_release+0x3b9/0x690
> > [  571.722108][   T30]  ? do_read_cache_folio+0x4ff/0xb20
> > [  571.722447][   T30]  ? lock_downgrade+0x6b0/0x6b0
> > [  571.722785][   T30]  ? mark_held_locks+0xb0/0x110
> > [  571.723044][   T30]  schedule+0xd3/0x1b0
> > [  571.723264][   T30]  io_schedule+0x1b/0x70
> > [  571.723489][   T30]  ? do_read_cache_folio+0x58c/0xb20
> > [  571.723760][   T30]  do_read_cache_folio+0x58c/0xb20
> > [  571.724036][   T30]  ? blkdev_readahead+0x20/0x20
> > [  571.724319][   T30]  ? __filemap_get_folio+0x8e0/0x8e0
> > [  571.724588][   T30]  ? __sanitizer_cov_trace_switch+0x53/0x90
> > [  571.724885][   T30]  ? __sanitizer_cov_trace_pc+0x1e/0x50
> > [  571.725246][   T30]  ? format_decode+0x1cf/0xb50
> > [  571.725547][   T30]  ? __sanitizer_cov_trace_pc+0x1e/0x50
> > [  571.725837][   T30]  ? fill_ptr_key+0x30/0x30
> > [  571.726072][   T30]  ? default_pointer+0x4a0/0x4a0
> > [  571.726335][   T30]  ? __isolate_free_page+0x220/0x220
> > [  571.726608][   T30]  ? filemap_fdatawrite_wbc+0x1c0/0x1c0
> > [  571.726888][   T30]  ? __sanitizer_cov_trace_pc+0x1e/0x50
> > [  571.727172][   T30]  ? read_part_sector+0x229/0x420
> > [  571.727434][   T30]  ? adfspart_check_ADFS+0x560/0x560
> > [  571.727707][   T30]  read_part_sector+0xfa/0x420
> > [  571.727963][   T30]  adfspart_check_POWERTEC+0x90/0x690
> > [  571.728244][   T30]  ? adfspart_check_ADFS+0x560/0x560
> > [  571.728520][   T30]  ? __kasan_slab_alloc+0x33/0x70
> > [  571.728780][   T30]  ? adfspart_check_ICS+0x8f0/0x8f0
> > [  571.729889][   T30]  ? snprintf+0xb2/0xe0
> > [  571.730145][   T30]  ? vsprintf+0x30/0x30
> > [  571.730374][   T30]  ? __sanitizer_cov_trace_pc+0x1e/0x50
> > [  571.730659][   T30]  ? adfspart_check_ICS+0x8f0/0x8f0
> > [  571.730928][   T30]  bdev_disk_changed+0x674/0x1260
> > [  571.731189][   T30]  ? write_comp_data+0x1f/0x70
> > [  571.731439][   T30]  ? iput+0xd0/0x780
> > [  571.731646][   T30]  blkdev_get_whole+0x186/0x260
> > [  571.731886][   T30]  blkdev_get_by_dev+0x4ce/0xae0
> > [  571.732139][   T30]  blkdev_open+0x140/0x2c0
> > [  571.732366][   T30]  do_dentry_open+0x6de/0x1450
> > [  571.732612][   T30]  ? blkdev_close+0x80/0x80
> > [  571.732848][   T30]  path_openat+0xd6d/0x26d0
> > [  571.733084][   T30]  ? lock_downgrade+0x6b0/0x6b0
> > [  571.733336][   T30]  ? vfs_path_lookup+0x110/0x110
> > [  571.733591][   T30]  do_filp_open+0x1bb/0x290
> > [  571.733824][   T30]  ? may_open_dev+0xf0/0xf0
> > [  571.734061][   T30]  ? __phys_addr_symbol+0x30/0x70
> > [  571.734324][   T30]  ? do_raw_spin_unlock+0x176/0x260
> > [  571.734595][   T30]  do_sys_openat2+0x5fd/0x980
> > [  571.734837][   T30]  ? file_open_root+0x3f0/0x3f0
> > [  571.735087][   T30]  ? seccomp_notify_ioctl+0xff0/0xff0
> > [  571.735368][   T30]  do_sys_open+0xce/0x140
> > [  571.735596][   T30]  ? filp_open+0x80/0x80
> > [  571.735820][   T30]  ? __secure_computing+0x1e3/0x340
> > [  571.736090][   T30]  do_syscall_64+0x38/0x80
> > [  571.736325][   T30]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > [  571.736626][   T30] RIP: 0033:0x7fb212210840
> > [  571.736857][   T30] RSP: 002b:00007fffb37bbbe8 EFLAGS: 00000246
> > ORIG_RAX: 0000000000000002
> > [  571.737269][   T30] RAX: ffffffffffffffda RBX: 0000560e09072e10
> > RCX: 00007fb212210840
> > [  571.737651][   T30] RDX: 0000560e08e39fe3 RSI: 00000000000a0800
> > RDI: 0000560e090813b0
> > [  571.738037][   T30] RBP: 00007fffb37bbd60 R08: 0000560e08e39670
> > R09: 0000000000000010
> > [  571.738432][   T30] R10: 0000560e08e39d0c R11: 0000000000000246
> > R12: 00007fffb37bbcb0
> > [  571.739563][   T30] R13: 0000560e09087a70 R14: 0000000000000003
> > R15: 000000000000000e
> > [  571.739973][   T30]  </TASK>
> > [  571.740133][   T30]
> > [  571.740133][   T30] Showing all locks held in the system:
> > [  571.740495][   T30] 1 lock held by rcu_tasks_kthre/13:
> > [  571.740758][   T30]  #0: ffffffff8b6badd0
> > (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at:
> > rcu_tasks_one_gp+0x2b/0xdb0
> > [  571.741301][   T30] 1 lock held by rcu_tasks_trace/14:
> > [  571.741571][   T30]  #0: ffffffff8b6baad0
> > (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at:
> > rcu_tasks_one_gp+0x2b/0xdb0
> > [  571.742134][   T30] 1 lock held by khungtaskd/30:
> > [  571.742385][   T30]  #0: ffffffff8b6bb960
> > (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x5b/0x300
> > [  571.742947][   T30] 2 locks held by kworker/u8:0/50:
> > [  571.743198][   T30]  #0: ffff888016e7b138
> > ((wq_completion)nbd0-recv){+.+.}-{0:0}, at:
> > process_one_work+0x94b/0x17b0
> > [  571.743809][   T30]  #1: ffff888011e4fdd0
> > ((work_completion)(&args->work)){+.+.}-{0:0}, at:
> > process_one_work+0x984/0x17b0
> > [  571.744393][   T30] 1 lock held by in:imklog/6784:
> > [  571.744643][   T30]  #0: ffff88801106e368
> > (&f->f_pos_lock){+.+.}-{3:3}, at: __fdget_pos+0xe9/0x100
> > [  571.745122][   T30] 1 lock held by systemd-udevd/7123:
> > [  571.745381][   T30]  #0: ffff8880431854c8
> > (&disk->open_mutex){+.+.}-{3:3}, at: blkdev_get_by_dev+0x24b/0xae0
> > [  571.745885][   T30]
> > [  571.746008][   T30] =============================================
> > [  571.746008][   T30]
> > [  571.746424][   T30] NMI backtrace for cpu 1
> > [  571.746642][   T30] CPU: 1 PID: 30 Comm: khungtaskd Not tainted 6.4.0-rc2 #1
> > [  571.746989][   T30] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > 1996), BIOS 1.12.0-1 04/01/2014
> > [  571.747440][   T30] Call Trace:
> > [  571.747606][   T30]  <TASK>
> > [  571.747764][   T30]  dump_stack_lvl+0x91/0xf0
> > [  571.747997][   T30]  nmi_cpu_backtrace+0x21a/0x2b0
> > [  571.748257][   T30]  ? lapic_can_unplug_cpu+0xa0/0xa0
> > [  571.748525][   T30]  nmi_trigger_cpumask_backtrace+0x28c/0x2f0
> > [  571.748830][   T30]  watchdog+0xe4b/0x10c0
> > [  571.749057][   T30]  ? proc_dohung_task_timeout_secs+0x90/0x90
> > [  571.749366][   T30]  kthread+0x33b/0x430
> > [  571.749596][   T30]  ? kthread_complete_and_exit+0x40/0x40
> > [  571.749891][   T30]  ret_from_fork+0x1f/0x30
> > [  571.750126][   T30]  </TASK>
> > [  571.750347][   T30] Sending NMI from CPU 1 to CPUs 0:
> > [  571.750620][    C0] NMI backtrace for cpu 0
> > [  571.750626][    C0] CPU: 0 PID: 3987 Comm: systemd-journal Not
> > tainted 6.4.0-rc2 #1
> > [  571.750637][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > 1996), BIOS 1.12.0-1 04/01/2014
> > [  571.750643][    C0] RIP: 0033:0x7fb1d8c34bd1
> > [  571.750652][    C0] Code: ed 4d 89 cf 75 a3 0f 1f 00 48 85 ed 75 4b
> > 48 8b 54 24 28 48 8b 44 24 18 48 8b 7c 24 20 48 29 da 48 8b 70 20 48
> > 0f af 54 24 08 <48> 83 c4 38 5b 5d 41 5c 41 5d 41 5e 41 5f e9 ac f2 04
> > 00 0f 1f 40
> > [  571.750662][    C0] RSP: 002b:00007ffff9686c30 EFLAGS: 00000202
> > [  571.750670][    C0] RAX: 00007ffff9686e50 RBX: 0000000000000002
> > RCX: 0000000000000010
> > [  571.750677][    C0] RDX: 0000000000000010 RSI: 00007ffff9686d80
> > RDI: 00007ffff9686f20
> > [  571.750683][    C0] RBP: 0000000000000000 R08: 0000000000000010
> > R09: 00007ffff9686d90
> > [  571.750689][    C0] R10: 00007ffff9686fb0 R11: 00007fb1d8d6a060
> > R12: 00007ffff9686f30
> > [  571.750696][    C0] R13: 00007fb1d9d20ee0 R14: 00007ffff9686f30
> > R15: 00007ffff9686d90
> > [  571.750703][    C0] FS:  00007fb1da33d8c0 GS:  0000000000000000
> > [  571.752358][   T30] Kernel panic - not syncing: hung_task: blocked tasks
> > [  571.757337][   T30] CPU: 1 PID: 30 Comm: khungtaskd Not tainted 6.4.0-rc2 #1
> > [  571.757686][   T30] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > 1996), BIOS 1.12.0-1 04/01/2014
> > [  571.758131][   T30] Call Trace:
> > [  571.758302][   T30]  <TASK>
> > [  571.758462][   T30]  dump_stack_lvl+0x91/0xf0
> > [  571.758714][   T30]  panic+0x62d/0x6a0
> > [  571.758926][   T30]  ? panic_smp_self_stop+0x90/0x90
> > [  571.759188][   T30]  ? preempt_schedule_common+0x1a/0xc0
> > [  571.759486][   T30]  ? preempt_schedule_thunk+0x1a/0x20
> > [  571.759785][   T30]  ? watchdog+0xc21/0x10c0
> > [  571.760020][   T30]  watchdog+0xc32/0x10c0
> > [  571.760240][   T30]  ? proc_dohung_task_timeout_secs+0x90/0x90
> > [  571.760541][   T30]  kthread+0x33b/0x430
> > [  571.760753][   T30]  ? kthread_complete_and_exit+0x40/0x40
> > [  571.761052][   T30]  ret_from_fork+0x1f/0x30
> > [  571.761286][   T30]  </TASK>
> > [  571.761814][   T30] Kernel Offset: disabled
> > [  571.762047][   T30] Rebooting in 86400 seconds..
> >
> >> You need to include poc_blkdev.c as part of your report.
> >
> > It's a little confusing and I'm sorry for that.
> > The poc_blkdev.c is exactly the C reproducer
> > (https://pastebin.com/raw/6mg7uF8W).
> >
> >> I suspect you've done something that is known to not work (as root,
> >> so we won't necessarily care).  But I can't really say without seeing
> >> what you've done.  Running syzkaller is an art, and most people aren't
> >> good at it.  It takes a lot of work to submit good quality bug reports,
> >> see this article:
> >>
> >> https://blog.regehr.org/archives/2037
> >
> > I have read this article and thanks for your recommendations.
> > I'm not familiar with this module and I haven't figured out the root
> > cause of this bug yet.
> >
> > Regards,
> >
> > Yang
> >
> > Matthew Wilcox <willy@...radead.org> 于2023年5月17日周三 20:20写道：
> >>
> >> On Wed, May 17, 2023 at 07:12:23PM +0800, yang lan wrote:
> >>> root@...kaller:~# uname -a
> >>> Linux syzkaller 5.10.179 #1 SMP PREEMPT Thu Apr 27 16:22:48 CST 2023
> >>
> >> Does this reproduce on current kernels, eg 6.4-rc2?
> >>
> >>> root@...kaller:~# gcc poc_blkdev.c -o poc_blkdev
> >>
> >> You need to include poc_blkdev.c as part of your report.
> >>
> >>> Please let me know if I can provide any more information, and I hope I
> >>> didn't mess up this bug report.
> >>
> >> I suspect you've done something that is known to not work (as root,
> >> so we won't necessarily care).  But I can't really say without seeing
> >> what you've done.  Running syzkaller is an art, and most people aren't
> >> good at it.  It takes a lot of work to submit good quality bug reports,
> >> see this article:
> >>
> >> https://blog.regehr.org/archives/2037
> >
> > Matthew Wilcox <willy@...radead.org> 于2023年5月17日周三 20:20写道：
> >>
> >> On Wed, May 17, 2023 at 07:12:23PM +0800, yang lan wrote:
> >>> root@...kaller:~# uname -a
> >>> Linux syzkaller 5.10.179 #1 SMP PREEMPT Thu Apr 27 16:22:48 CST 2023
> >>
> >> Does this reproduce on current kernels, eg 6.4-rc2?
> >>
> >>> root@...kaller:~# gcc poc_blkdev.c -o poc_blkdev
> >>
> >> You need to include poc_blkdev.c as part of your report.
> >>
> >>> Please let me know if I can provide any more information, and I hope I
> >>> didn't mess up this bug report.
> >>
> >> I suspect you've done something that is known to not work (as root,
> >> so we won't necessarily care).  But I can't really say without seeing
> >> what you've done.  Running syzkaller is an art, and most people aren't
> >> good at it.  It takes a lot of work to submit good quality bug reports,
> >> see this article:
> >>
> >> https://blog.regehr.org/archives/2037
> > .
> >
>