[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAO8a2SioQ+iUgoC8+NiMkWS6Wj=2YwV7k9LWp5bfx2ZxpWJU+w@mail.gmail.com>
Date: Wed, 30 Jul 2025 16:45:36 +0400
From: Alex Markuze <amarkuze@...hat.com>
To: Zhao Sun <sunzhao03@...ishou.com>
Cc: xiubli@...hat.com, idryomov@...il.com, ceph-devel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ceph: fix deadlock in ceph_readdir_prepopulate
Hi,
The patch correctly identifies and addresses the deadlock by releasing
snap_rwsem before the blocking ceph_get_inode call.
My Recommendation:
Use down_read_killable: The current patch uses
down_read(&mdsc->snap_rwsem) after calling ceph_get_inode.
This is problematic because down_read is uninterruptible. If a signal
(like SIGKILL or Ctrl+C) is sent to the process while it's waiting to
re-acquire the snap_rwsem read lock (e.g., if another thread holds the
write lock), the process will hang indefinitely and cannot be killed.
Using down_read_killable(&mdsc->snap_rwsem) instead allows the lock
acquisition to be interrupted by signals. If it fails (returns
-EINTR), the error must be handled properly (e.g., release the inode
reference obtained from ceph_get_inode and propagate the error) to
ensure the process remains killable and doesn't hang.
This change is essential to prevent potential system hangs.
Best regards,
On Wed, Jul 30, 2025 at 1:49 PM Zhao Sun <sunzhao03@...ishou.com> wrote:
>
> When ceph_readdir_prepopulate calls ceph_get_inode while holding
> mdsc->snap_rwsem, a deadlock may occur, blocking all subsequent
> requests of the current session.
>
> Fix by release the mds->snap_rwsem read lock before calling the
> ceph_get_inode function.
>
> Link: https://tracker.ceph.com/issues/72307
> Signed-off-by: Zhao Sun <sunzhao03@...ishou.com>
> ---
> fs/ceph/inode.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> index 06cd2963e41e..3d7fb045ba76 100644
> --- a/fs/ceph/inode.c
> +++ b/fs/ceph/inode.c
> @@ -1900,6 +1900,7 @@ static int fill_readdir_cache(struct inode *dir, struct dentry *dn,
> int ceph_readdir_prepopulate(struct ceph_mds_request *req,
> struct ceph_mds_session *session)
> {
> + struct ceph_mds_client *mdsc = session->s_mdsc;
> struct dentry *parent = req->r_dentry;
> struct inode *inode = d_inode(parent);
> struct ceph_inode_info *ci = ceph_inode(inode);
> @@ -2029,7 +2030,10 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
> if (d_really_is_positive(dn)) {
> in = d_inode(dn);
> } else {
> + /* Release mdsc->snap_rwsem in advance to avoid deadlock */
> + up_read(&mdsc->snap_rwsem);
> in = ceph_get_inode(parent->d_sb, tvino, NULL);
> + down_read(&mdsc->snap_rwsem);
> if (IS_ERR(in)) {
> doutc(cl, "new_inode badness\n");
> d_drop(dn);
> --
> 2.39.2 (Apple Git-143)
>
>
Powered by blists - more mailing lists