[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26ab430040f0406087f4f6a2241525ce@kuaishou.com>
Date: Wed, 21 Jan 2026 07:33:02 +0000
From: 李磊 <lilei24@...ishou.com>
To: Viacheslav Dubeyko <Slava.Dubeyko@....com>
CC: Alex Markuze <amarkuze@...hat.com>, "idryomov@...il.com"
<idryomov@...il.com>, 孙朝 <sunzhao03@...ishou.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>
Subject: 答复: 答复: 答复: 【外部邮件!】Re: [PATCH v2] ceph: fix deadlock in ceph_readdir_prepopulate
Hi Slava,
Zhao and I have found a way to reproduce this issue.
1. try to find 2 different directories (DIR_a DIR_b) in a cephfs cluster and make sure they have different auth mds nodes. In this
way, a client may have chances to run handle_reply on different CPU for our test (see step 4 and step 6).
2. In DIR_b, create a hard link of DIR_a/FILE_a, namely FILE_b. DIR_a/FILE_a and DIR_b/FILE_b have the same ino (123456 e.g)
3. Save ino in code below, make it sleep for stat command.
```
@@ -3950,6 +3951,10 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
goto out_err;
}
req->r_target_inode = in;
+ if (in->i_ino == 123456) {
+ pr_err("inode %lu found, ready to wait 10 seconds.\n", in->i_ino);
+ msleep(10000);
+ }
```
4. echo 3 > /proc/sys/vm/drop_caches
5. in a shell, do `stat DIR_a/FILE_a`, we suppose to be stuck on this shell because of msleep() in handle_reply().
6. in the other shell, do `ls DIR_b/` to trigger ceph_readdir_prepopulate()
Repeat step 4 to step 6 for several times (5 times is enough I guess). And we'll see the deadlock.
________________________________________
发件人: Viacheslav Dubeyko <Slava.Dubeyko@....com>
发送时间: 2026年1月8日 3:59
收件人: 李磊
抄送: Alex Markuze; idryomov@...il.com; 孙朝; linux-kernel@...r.kernel.org; ceph-devel@...r.kernel.org
主题: Re: 答复: 答复: 【外部邮件!】Re: [PATCH v2] ceph: fix deadlock in ceph_readdir_prepopulate
On Wed, 2026-01-07 at 16:01 +0000, 李磊 wrote:
> Hi Slava,
>
> This issue is very rare on our internal cephfs clusters. We had only encountered it for about three times.
> But we are working on same hacking methods to speed up the reproduction. I think it will take me one week
> if everything goes smoothly and I will share the methods here.
>
> To be honest, this patch should be a revert patch of this one:
>
> commit : bca9fc14c70fcbbebc84954cc39994e463fb9468
> ceph: when filling trace, call ceph_get_inode outside of mutexes
>
> I'll resend this patch later.
Sounds good. If I remember correctly, the main issue with the initial patch was
the commit message that didn't have good explanation of the issue and why this
revert can fix the issue. So, if we have all of these details in the commit
message, then the patch should be in good shape.
Thanks,
Slava.
Powered by blists - more mailing lists