[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <550503.1690588340@warthog.procyon.org.uk>
Date: Sat, 29 Jul 2023 00:52:20 +0100
From: David Howells <dhowells@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: dhowells@...hat.com,
syzbot <syzbot+f527b971b4bdc8e79f9e@...kaller.appspotmail.com>,
bpf@...r.kernel.org, brauner@...nel.org, davem@...emloft.net,
dsahern@...nel.org, edumazet@...gle.com,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, pabeni@...hat.com,
syzkaller-bugs@...glegroups.com, viro@...iv.linux.org.uk
Subject: Re: [syzbot] [fs?] INFO: task hung in pipe_release (4)
Jakub Kicinski <kuba@...nel.org> wrote:
> Hi David, any ideas about this one? Looks like it triggers on fairly
> recent upstream?
I've managed to reproduce it finally. Instrumenting the pipe_lock/unlock
functions, splice_to_socket() and pipe_release() seems to show that
pipe_release() is being called whilst splice_to_socket() is still running.
I *think* syzbot is arranging things such that splice_to_socket() takes a
significant amount of time so that another thread can close the socket as it
exits.
In this sample logging, the pipe is created by pid 7101:
[ 66.205719] --pipe 7101
[ 66.209942] lock
[ 66.212526] locked
[ 66.215344] unlock
[ 66.218103] unlocked
splice begins in 7101 also and locks the pipe:
[ 66.221057] ==>splice_to_socket() 7101
[ 66.225596] lock
[ 66.228177] locked
but for some reason, pid 7100 then tries to release it:
[ 66.377781] release 7100
and hangs on the __pipe_lock() call in pipe_release():
[ 66.381059] lock
The syz reproducer does weird things with threading - and I'm wondering if
there's a file struct refcount bug here. Note that splice_to_socket() can't
access the pipe file structs to alter the refcount, and the involved pipe
isn't communicated to udp_sendmsg() in any way - so if there is a refcount
bug, it must be somewhere in the VFS, the pipe driver or the splice
infrastructure:-/.
I'm also not sure what's going on inside udp_sendmsg() as yet. It doesn't
show a stack in /proc/7101/stacks, which means it doesn't hit a schedule().
David
Powered by blists - more mailing lists