lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <26ab430040f0406087f4f6a2241525ce@kuaishou.com>
Date: Wed, 21 Jan 2026 07:33:02 +0000
From: 李磊 <lilei24@...ishou.com>
To: Viacheslav Dubeyko <Slava.Dubeyko@....com>
CC: Alex Markuze <amarkuze@...hat.com>, "idryomov@...il.com"
	<idryomov@...il.com>, 孙朝 <sunzhao03@...ishou.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>
Subject: 答复:  答复:  答复: 【外部邮件!】Re:  [PATCH v2] ceph: fix deadlock in ceph_readdir_prepopulate

Hi Slava,

Zhao and I have found a way to reproduce this issue. 

1. try to find 2 different directories (DIR_a DIR_b) in a cephfs cluster and make sure they have different auth mds nodes. In this
    way, a client may have chances to run handle_reply on different CPU for our test (see step 4 and step 6).
2. In DIR_b, create a hard link of DIR_a/FILE_a, namely FILE_b. DIR_a/FILE_a and DIR_b/FILE_b have the same ino (123456 e.g)
3. Save ino in code below, make it sleep for stat command.
```
@@ -3950,6 +3951,10 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
                        goto out_err;
                }
                req->r_target_inode = in;
+               if (in->i_ino == 123456) {
+                       pr_err("inode %lu found, ready to wait 10 seconds.\n", in->i_ino);
+                       msleep(10000);
+               }
```
4. echo 3 > /proc/sys/vm/drop_caches
5. in a shell, do `stat DIR_a/FILE_a`, we suppose to be stuck on this shell because of msleep() in handle_reply().
6. in the other shell, do `ls DIR_b/` to trigger ceph_readdir_prepopulate()

Repeat step 4 to step 6 for several times (5 times is enough I guess). And we'll see the deadlock.


________________________________________
发件人: Viacheslav Dubeyko <Slava.Dubeyko@....com>
发送时间: 2026年1月8日 3:59
收件人: 李磊
抄送: Alex Markuze; idryomov@...il.com; 孙朝; linux-kernel@...r.kernel.org; ceph-devel@...r.kernel.org
主题: Re:  答复:  答复: 【外部邮件!】Re:  [PATCH v2] ceph: fix deadlock in ceph_readdir_prepopulate

On Wed, 2026-01-07 at 16:01 +0000, 李磊 wrote:
> Hi Slava,
>
> This issue is very rare on our internal cephfs clusters. We had only encountered it for about three times.
> But we are working on same hacking methods to speed up the reproduction. I think it will take me one week
> if everything goes smoothly and I will share the methods here.
>
> To be honest, this patch should be a revert patch of this one:
>
> commit : bca9fc14c70fcbbebc84954cc39994e463fb9468
> ceph: when filling trace, call ceph_get_inode outside of mutexes
>
> I'll resend this patch later.

Sounds good. If I remember correctly, the main issue with the initial patch was
the commit message that didn't have good explanation of the issue and why this
revert can fix the issue. So, if we have all of these details in the commit
message, then the patch should be in good shape.

Thanks,
Slava.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ