[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27280d59-88ff-7eeb-1e43-eb9bd23df761@gmail.com>
Date: Fri, 22 Oct 2021 14:57:04 +0100
From: Pavel Begunkov <asml.silence@...il.com>
To: syzbot <syzbot+27d62ee6f256b186883e@...kaller.appspotmail.com>,
axboe@...nel.dk, io-uring@...r.kernel.org,
linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] INFO: task hung in io_wqe_worker
On 10/22/21 14:49, Pavel Begunkov wrote:
> On 10/22/21 05:38, syzbot wrote:
>> Hello,
>>
>> syzbot has tested the proposed patch but the reproducer is still triggering an issue:
>> INFO: task hung in io_wqe_worker
>>
>> INFO: task iou-wrk-9392:9401 blocked for more than 143 seconds.
>> Not tainted 5.15.0-rc2-syzkaller #0
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:iou-wrk-9392 state:D stack:27952 pid: 9401 ppid: 7038 flags:0x00004004
>> Call Trace:
>> context_switch kernel/sched/core.c:4940 [inline]
>> __schedule+0xb44/0x5960 kernel/sched/core.c:6287
>> schedule+0xd3/0x270 kernel/sched/core.c:6366
>> schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
>> do_wait_for_common kernel/sched/completion.c:85 [inline]
>> __wait_for_common kernel/sched/completion.c:106 [inline]
>> wait_for_common kernel/sched/completion.c:117 [inline]
>> wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
>> io_worker_exit fs/io-wq.c:183 [inline]
>> io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597
>> ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
>
> Easily reproducible, it's stuck in
>
> static void io_worker_exit(struct io_worker *worker)
> {
> ...
> wait_for_completion(&worker->ref_done);
> ...
> }
>
> The reference belongs to a create_worker_cb() task_work item. It's expected
> to either be executed or cancelled by io_wq_exit_workers(), but the owner
> task never goes __io_uring_cancel (called in do_exit()) and so never
> reaches io_wq_exit_workers().
>
> Following the owner task, cat /proc/<pid>/stack:
>
> [<0>] do_coredump+0x1d0/0x10e0
> [<0>] get_signal+0x4a3/0x960
> [<0>] arch_do_signal_or_restart+0xc3/0x6d0
> [<0>] exit_to_user_mode_prepare+0x10e/0x190
> [<0>] irqentry_exit_to_user_mode+0x9/0x20
> [<0>] irqentry_exit+0x36/0x40
> [<0>] exc_page_fault+0x95/0x190
> [<0>] asm_exc_page_fault+0x1e/0x30
>
> (gdb) l *(do_coredump+0x1d0-5)
> 0xffffffff81343ccb is in do_coredump (fs/coredump.c:469).
> 464
> 465 if (core_waiters > 0) {
> 466 struct core_thread *ptr;
> 467
> 468 freezer_do_not_count();
> 469 wait_for_completion(&core_state->startup);
> 470 freezer_count();
>
> Can't say anything more at the moment as not familiar with coredump
A simple hack allowing task works to be executed from there
workarounds the problem
diff --git a/fs/coredump.c b/fs/coredump.c
index 3224dee44d30..f6f9dfb02296 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -466,7 +466,8 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
struct core_thread *ptr;
freezer_do_not_count();
- wait_for_completion(&core_state->startup);
+ while (wait_for_completion_interruptible(&core_state->startup))
+ tracehook_notify_signal();
freezer_count();
/*
* Wait for all the threads to become inactive, so that
--
Pavel Begunkov
Powered by blists - more mailing lists