lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAN0SSYwzsVEvopiuJuQTbJkOeGhDtLLFMsetVM2m5zOa0JEwDA@mail.gmail.com>
Date: Mon, 25 Nov 2024 18:32:00 +0100
From: Mark Liam Brown <brownmarkliam@...il.com>
To: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [bug report] deploying both NFS client and server on the same
 machine triggle hungtask

On Mon, Nov 25, 2024 at 1:48 PM Li Lingfeng <lilingfeng3@...wei.com> wrote:
>
> Hi, we have found a hungtask issue recently.
>
> Commit 7746b32f467b ("NFSD: add shrinker to reap courtesy clients on low
> memory condition") adds a shrinker to NFSD, which causes NFSD to try to
> obtain shrinker_rwsem when starting and stopping services.
>
> Deploying both NFS client and server on the same machine may lead to the
> following issue, since they will share the global shrinker_rwsem.
>
>      nfsd                            nfs
>                              drop_cache // hold shrinker_rwsem
>                              write back, wait for rpc_task to exit
> // stop nfsd threads
> svc_set_num_threads
> // clean up xprts
> svc_xprt_destroy_all
>                              rpc_check_timeout
>                               rpc_check_connected
>                               // wait for the connection to be disconnected
> unregister_shrinker
> // wait for shrinker_rwsem
>
> Normally, the client's rpc_task will exit after the server's nfsd thread
> has processed the request.
> When all the server's nfsd threads exit, the client’s rpc_task is expected
> to detect the network connection being disconnected and exit.
> However, although the server has executed svc_xprt_destroy_all before
> waiting for shrinker_rwsem, the network connection is not actually
> disconnected. Instead, the operation to close the socket is simply added
> to the task_works queue.
>
> svc_xprt_destroy_all
>   ...
>   svc_sock_free
>    sockfd_put
>     fput_many
>      init_task_work // ____fput
>      task_work_add // add to task->task_works
>
> The actual disconnection of the network connection will only occur after
> the current process finishes.
> do_exit
>   exit_task_work
>    task_work_run
>    ...
>     ____fput // close sock
>
> Although it is not a common practice to deploy NFS client and server on
> the same machine, I think this issue still needs to be addressed,
> otherwise it will cause all processes trying to acquire the shrinker_rwsem
> to hang.

I disagree with that comment. Most small companies have NFS client and
NFS server on the same machine, the client being used to allow logins
by users, or to support schroot or containers.

Mark
-- 
IT Infrastructure Consultant
Windows, Linux

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ