linux-kernel - Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <E101BD08-4779-4945-84AE-F3660B7A159D@oracle.com>
Date: Fri, 11 Oct 2024 21:13:22 +0000
From: Chuck Lever III <chuck.lever@...cle.com>
To: Neil Brown <neilb@...e.de>
CC: Jeff Layton <jlayton@...nel.org>,
        syzbot
	<syzbot+d1e76d963f757db40f91@...kaller.appspotmail.com>,
        Dai Ngo
	<dai.ngo@...cle.com>, Olga Kornievskaia <kolga@...app.com>,
        Linux Kernel
 Mailing List <linux-kernel@...r.kernel.org>,
        Linux NFS Mailing List
	<linux-nfs@...r.kernel.org>,
        Lorenzo Bianconi <lorenzo@...nel.org>,
        netdev
	<netdev@...r.kernel.org>,
        Olga Kornievskaia <okorniev@...hat.com>,
        "syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>,
        Tom
 Talpey <tom@...pey.com>
Subject: Re: [syzbot] [nfs?] INFO: task hung in nfsd_nl_listener_set_doit



> On Oct 11, 2024, at 5:08 PM, NeilBrown <neilb@...e.de> wrote:
> 
> On Sat, 12 Oct 2024, Chuck Lever III wrote:
>> 
>> 
>>> On Oct 9, 2024, at 4:26 PM, Jeff Layton <jlayton@...nel.org> wrote:
>>> 
>>> On Wed, 2024-09-04 at 10:23 -0400, Chuck Lever wrote:
>>>> On Mon, Sep 02, 2024 at 11:57:55AM +1000, NeilBrown wrote:
>>>>> On Sun, 01 Sep 2024, syzbot wrote:
>>>>>> syzbot has found a reproducer for the following issue on:
>>>>> 
>>>>> I had a poke around using the provided disk image and kernel for
>>>>> exploring.
>>>>> 
>>>>> I think the problem is demonstrated by this stack :
>>>>> 
>>>>> [<0>] rpc_wait_bit_killable+0x1b/0x160
>>>>> [<0>] __rpc_execute+0x723/0x1460
>>>>> [<0>] rpc_execute+0x1ec/0x3f0
>>>>> [<0>] rpc_run_task+0x562/0x6c0
>>>>> [<0>] rpc_call_sync+0x197/0x2e0
>>>>> [<0>] rpcb_register+0x36b/0x670
>>>>> [<0>] svc_unregister+0x208/0x730
>>>>> [<0>] svc_bind+0x1bb/0x1e0
>>>>> [<0>] nfsd_create_serv+0x3f0/0x760
>>>>> [<0>] nfsd_nl_listener_set_doit+0x135/0x1a90
>>>>> [<0>] genl_rcv_msg+0xb16/0xec0
>>>>> [<0>] netlink_rcv_skb+0x1e5/0x430
>>>>> 
>>>>> No rpcbind is running on this host so that "svc_unregister" takes a
>>>>> long time.  Maybe not forever but if a few of these get queued up all
>>>>> blocking some other thread, then maybe that pushed it over the limit.
>>>>> 
>>>>> The fact that rpcbind is not running might not be relevant as the test
>>>>> messes up the network.  "ping 127.0.0.1" stops working.
>>>>> 
>>>>> So this bug comes down to "we try to contact rpcbind while holding a
>>>>> mutex and if that gets no response and no error, then we can hold the
>>>>> mutex for a long time".
>>>>> 
>>>>> Are we surprised? Do we want to fix this?  Any suggestions how?
>>>> 
>>>> In the past, we've tried to address "hanging upcall" issues where
>>>> the kernel part of an administrative command needs a user space
>>>> service that isn't working or present. (eg mount needing a running
>>>> gssd)
>>>> 
>>>> If NFSD is using the kernel RPC client for the upcall, then maybe
>>>> adding the RPC_TASK_SOFTCONN flag might turn the hang into an
>>>> immediate failure.
>>>> 
>>>> IMO this should be addressed.
>>>> 
>>>> 
>>> 
>>> I sent a patch that does the above, but now I'm wondering if we ought
>>> to take another approach. The listener array can be pretty long. What
>>> if we instead were to just drop and reacquire the mutex in the loop at
>>> strategic points? Then we wouldn't squat on the mutex for so long. 
>>> 
>>> Something like this maybe? It's ugly but it might prevent hung task
>>> warnings, and listener setup isn't a fastpath anyway.
>>> 
>>> 
>>> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
>>> index 3adbc05ebaac..5de01fb4c557 100644
>>> --- a/fs/nfsd/nfsctl.c
>>> +++ b/fs/nfsd/nfsctl.c
>>> @@ -2042,7 +2042,9 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>> 
>>>               set_bit(XPT_CLOSE, &xprt->xpt_flags);
>>>               spin_unlock_bh(&serv->sv_lock);
>>> 
>>>               svc_xprt_close(xprt);
>>> +
>>> +               /* ensure we don't squat on the mutex for too long */
>>> +               mutex_unlock(&nfsd_mutex);
>>> +               mutex_lock(&nfsd_mutex);
>>>               spin_lock_bh(&serv->sv_lock);
>>>       }
>>> 
>>> @@ -2082,6 +2084,10 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
>>>               /* always save the latest error */
>>>               if (ret < 0)
>>>                       err = ret;
>>> +
>>> +               /* ensure we don't squat on the mutex for too long */
>>> +               mutex_unlock(&nfsd_mutex);
>>> +               mutex_lock(&nfsd_mutex);
>>>       }
>>> 
>>>       if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks))
>> 
>> I had a look at the rpcb upcall code a couple of weeks ago.
>> I'm not convinced that setting SOFTCONN in all cases will
>> help but unfortunately the reasons for my skepticism have
>> all but leaked out of my head.
>> 
>> Releasing and re-acquiring the mutex is often a sign of
>> a deeper problem. I think you're in the right vicinity
>> but I'd like to better understand the actual cause of
>> the delay. The listener list shouldn't be all that long,
>> but maybe it has a unintentional loop in it?
> 
> I think it is wrong to register with rpcbind while holding a mutex.
> Registering with rpcbind doesn't need to by synchronous does it?  Could
> we punt that to a workqueue?
> Do we need to get a failure status back somehow??
> wait_for_completion_killable() somewhere??

I think kernel RPC service start-up needs to fail immediately
if rpcbind registration doesn't work.


--
Chuck Lever