[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f8e85fe1-87e6-4b31-9e87-f48fd7b8e3f6@amd.com>
Date: Wed, 20 Aug 2025 11:59:20 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Oleg Nesterov <oleg@...hat.com>, Dominique Martinet
<asmadeus@...ewreck.org>, syzbot
<syzbot+d1b5dace43896bc386c3@...kaller.appspotmail.com>
CC: <akpm@...ux-foundation.org>, <brauner@...nel.org>, <dvyukov@...gle.com>,
<elver@...gle.com>, <glider@...gle.com>, <jack@...e.cz>,
<kasan-dev@...glegroups.com>, <linux-fsdevel@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<syzkaller-bugs@...glegroups.com>, <viro@...iv.linux.org.uk>,
<willy@...radead.org>, <v9fs@...ts.linux.dev>, David Howells
<dhowells@...hat.com>
Subject: Re: [PATCH] 9p/trans_fd: p9_fd_request: kick rx thread if EPOLLIN
Hello Oleg,
On 8/19/2025 9:40 PM, Oleg Nesterov wrote:
> p9_read_work() doesn't set Rworksched and doesn't do schedule_work(m->rq)
> if list_empty(&m->req_list).
>
> However, if the pipe is full, we need to read more data and this used to
> work prior to commit aaec5a95d59615 ("pipe_read: don't wake up the writer
> if the pipe is still full").
>
> p9_read_work() does p9_fd_read() -> ... -> anon_pipe_read() which (before
> the commit above) triggered the unnecessary wakeup. This wakeup calls
> p9_pollwake() which kicks p9_poll_workfn() -> p9_poll_mux(), p9_poll_mux()
> will notice EPOLLIN and schedule_work(&m->rq).
>
> This no longer happens after the optimization above, change p9_fd_request()
> to use p9_poll_mux() instead of only checking for EPOLLOUT.
>
> Reported-by: syzbot+d1b5dace43896bc386c3@...kaller.appspotmail.com
> Tested-by: syzbot+d1b5dace43896bc386c3@...kaller.appspotmail.com
> Closes: https://lore.kernel.org/all/68a2de8f.050a0220.e29e5.0097.GAE@google.com/
> Link: https://lore.kernel.org/all/67dedd2f.050a0220.31a16b.003f.GAE@google.com/
> Co-developed-by: K Prateek Nayak <kprateek.nayak@....com>
> Signed-off-by: K Prateek Nayak <kprateek.nayak@....com>
A "Debugged-by:" or equivalent would have been fine too since you did
most of the heavy lifting by finding p9_poll_mux() but I don't mind
standing behind this since it is doing the right thing :)
I tested this on top of v6.17-rc2 and the upstream runs into a hang
instantly with the syzbot's reproducer. The dmesg logs:
INFO: task repro:4150 blocked for more than 120 seconds.
Not tainted 6.17.0-rc2-upstream #34
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:repro state:D stack:0 pid:4150 tgid:4150 ppid:1 task_flags:0x400140 flags:0x00004006
Call Trace:
<TASK>
__schedule+0x474/0x1620
? __wb_update_bandwidth+0x37/0x1d0
schedule+0x27/0xd0
io_schedule+0x46/0x70
folio_wait_bit_common+0x112/0x300
? filemap_get_folios_tag+0x232/0x2a0
? __pfx_wake_page_function+0x10/0x10
folio_wait_writeback+0x2b/0x80
__filemap_fdatawait_range+0x7c/0xe0
file_write_and_wait_range+0x89/0xb0
v9fs_file_fsync+0x2d/0x90 [9p]
netfs_file_write_iter+0xec/0x120 [netfs]
vfs_write+0x305/0x420
ksys_write+0x65/0xe0
do_syscall_64+0x85/0xb30
? do_syscall_64+0x223/0xb30
? count_memcg_events+0xd9/0x1c0
? handle_mm_fault+0x1af/0x290
? do_user_addr_fault+0x2d0/0x8c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f3b26d1e88d
RSP: 002b:00007ffe581fa348 EFLAGS: 00000213 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3b26d1e88d
RDX: 0000000000007fec RSI: 0000200000000300 RDI: 0000000000000007
RBP: 00007ffe581fa360 R08: 00007ffe581fa360 R09: 00007ffe581fa360
R10: 00007ffe581fa360 R11: 0000000000000213 R12: 00007ffe581fa4b8
R13: 0000558168a6de12 R14: 0000558168a6fd10 R15: 00007f3b26f03040
</TASK>
With this patch applied on top, I haven't seen a hang yet and I've been
running it for 30min now so feel free to also include:
Tested-by: K Prateek Nayak <kprateek.nayak@....com>
> Signed-off-by: Oleg Nesterov <oleg@...hat.com>
> ---
> net/9p/trans_fd.c | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
> index 339ec4e54778..474fe67f72ac 100644
> --- a/net/9p/trans_fd.c
> +++ b/net/9p/trans_fd.c
> @@ -666,7 +666,6 @@ static void p9_poll_mux(struct p9_conn *m)
>
> static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
> {
> - __poll_t n;
> int err;
> struct p9_trans_fd *ts = client->trans;
> struct p9_conn *m = &ts->conn;
> @@ -686,13 +685,7 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
> list_add_tail(&req->req_list, &m->unsent_req_list);
> spin_unlock(&m->req_lock);
>
> - if (test_and_clear_bit(Wpending, &m->wsched))
> - n = EPOLLOUT;
> - else
> - n = p9_fd_poll(m->client, NULL, NULL);
> -
> - if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
> - schedule_work(&m->wq);
> + p9_poll_mux(m);
>
> return 0;
> }
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists