linux-kernel - Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <377fbe51-2e56-4538-89c5-eb91c13a2559@amd.com>
Date: Thu, 27 Mar 2025 23:16:27 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Dominique Martinet <asmadeus@...ewreck.org>, Oleg Nesterov
	<oleg@...hat.com>
CC: Eric Van Hensbergen <ericvh@...nel.org>, Latchesar Ionkov
	<lucho@...kov.net>, Christian Schoenebeck <linux_oss@...debyte.com>, "Mateusz
 Guzik" <mjguzik@...il.com>, syzbot
	<syzbot+62262fdc0e01d99573fc@...kaller.appspotmail.com>,
	<brauner@...nel.org>, <dhowells@...hat.com>, <jack@...e.cz>,
	<jlayton@...nel.org>, <linux-fsdevel@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>, <netfs@...ts.linux.dev>,
	<swapnil.sapkal@....com>, <syzkaller-bugs@...glegroups.com>,
	<viro@...iv.linux.org.uk>, <v9fs@...ts.linux.dev>
Subject: Re: [syzbot] [netfs?] INFO: task hung in netfs_unbuffered_write_iter

Hello all,

On 3/25/2025 8:28 PM, Dominique Martinet wrote:
> Thanks for the traces.
> 
> w/ revert
> K Prateek Nayak wrote on Tue, Mar 25, 2025 at 08:19:26PM +0530:
>>     kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_poll rd poll
>>     kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_request wr poll
>>     kworker/100:1-1803    [100] .....   286.618823: p9_read_work: Data read wait 7
> 
> new behavior
>>             repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_poll rd poll
>>             repro-4076    [031] .....    95.011394: p9_fd_poll: p9_fd_request wr poll
>>             repro-4076    [031] .....    99.731970: p9_client_rpc: Wait event killable (-512)
> 
> For me the problem isn't so much that this gets ERESTARTSYS but that it
> nevers gets to read the 7 bytes that are available?

I'm back after getting distracted for a bit. So here it goes:

On the mainline, the signal is actually SIGKILL from parent thread
due to a timeout:

            repro-4084    [112] d..2.   233.654264: signal_generate: sig=9 errno=0 code=0 comm=repro pid=4085 grp=1 res=0
            repro-4084    [112] d..1.   233.654272: signal_generate: sig=9 errno=0 code=0 comm=repro pid=4085 grp=1 res=1
            repro-4085    [125] .....   233.654304: p9_client_rpc: Wait event killable (-512) (1) (0) (118)
            repro-4085    [125] .....   233.654308: p9_client_rpc: Flushed (-512)
            repro-4085    [125] .....   233.654312: p9_client_rpc: Final error (-512)
            repro-4085    [125] .....   233.654313: p9_client_write: p9_client_rpc done
            repro-4085    [125] .....   233.654313: p9_client_write: p9_client_rpc error (-512)
            repro-4085    [125] .....   233.654315: v9fs_issue_write: Issue write done 2 err(-512)

Specifically, this part:

     for (;;) {
       sleep_ms(10);
       if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
         break;
       if (current_time_ms() - start < 5000)
         continue;
       kill_and_wait(pid, &status); /* <------- here */
       break;
     }

Now for the 7 bytes that were not read - If you look at the traces,
every time there is a valid trans_mod->request(), there is a kworker
wakeup:

            repro-4102    [100] .....   286.618772: p9_client_write: p9_client_rpc
            repro-4102    [100] .....   286.618780: p9_fd_request: p9_fd_request
            repro-4102    [100] .....   286.618781: p9_fd_request: p9_fd_request EPOLL
            repro-4102    [100] .....   286.618781: p9_fd_request: p9_fd_request schedule work  <----- this
    kworker/100:1-1803    [100] .....   286.618784: p9_write_work: Data write wait 32770

However, for that last 7 byte read, there is in fact no wakeup:

            repro-4102    [100] .....   286.618805: p9_client_write: p9_client_rpc
            repro-4102    [100] .....   286.618817: p9_fd_request: p9_fd_request
            repro-4102    [100] .....   286.618818: p9_fd_request: p9_fd_request EPOLL
            repro-4102    [100] .....   286.618818: p9_fd_poll: p9_fd_poll rd poll
            repro-4102    [100] .....   286.618818: p9_fd_poll: p9_fd_request wr poll  <--- No wakeup after
    kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_poll rd poll
    kworker/100:1-1803    [100] .....   286.618822: p9_fd_poll: p9_fd_request wr poll
    kworker/100:1-1803    [100] .....   286.618823: p9_read_work: Data read wait 7
    kworker/100:1-1803    [100] .....   286.618825: p9_read_work: Data read 7

The kworker would spuriously wakeup, do something, realizes it wasn't
suppose to read it, and then disconnects the connection waking up all
waiters.

But if we haven't sent a request, why are we waiting at all?

> 
> If the repro has already written the bytes in both cases then there's no
> reason to wait 5 seconds here...
> 
> OTOH syzbot 9p code is silly and might have been depending on something
> that's not supposed to work e.g. they might be missing a flush or
> equivalent for all I know (I still haven't looked at the repro)
> 

I believe if c->trans_mod->request(c, req) does not dispatch the
request to the worker for RPC, the caller must not wait. So here
goes something that got the benchmark going in my case which might
be totally wrong but worth a look:

#syz test: upstream aaec5a95d59615523db03dd53c2052f0a87beea7

diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index 196060dc6138..148533c993f1 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -687,7 +687,11 @@ static int p9_fd_request(struct p9_client *client, struct p9_req_t *req)
  	else
  		n = p9_fd_poll(m->client, NULL, NULL);
  
-	if (n & EPOLLOUT && !test_and_set_bit(Wworksched, &m->wsched))
+	/* Request was not sent */
+	if (!(n & EPOLLOUT))
+		return -EIO;
+
+	if (!test_and_set_bit(Wworksched, &m->wsched))
  		schedule_work(&m->wq);
  
  	return 0;