lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e93388c77ebad3d32a7a851031e3aea6a6991d31.camel@ibm.com>
Date: Wed, 21 Jan 2026 20:24:17 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: "lilei24@...ishou.com" <lilei24@...ishou.com>
CC: "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
        Alex Markuze
	<amarkuze@...hat.com>,
        "idryomov@...il.com" <idryomov@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "sunzhao03@...ishou.com" <sunzhao03@...ishou.com>
Subject: Re:  答复:  答复:  答复: 【外部邮件!】Re:  [PATCH v2] ceph: fix deadlock in ceph_readdir_prepopulate

On Wed, 2026-01-21 at 07:33 +0000, 李磊 wrote:
> Hi Slava,
> 
> Zhao and I have found a way to reproduce this issue. 

Sounds great!

> 
> 1. try to find 2 different directories (DIR_a DIR_b) in a cephfs cluster and make sure they have different auth mds nodes. In this
>     way, a client may have chances to run handle_reply on different CPU for our test (see step 4 and step 6).
> 2. In DIR_b, create a hard link of DIR_a/FILE_a, namely FILE_b. DIR_a/FILE_a and DIR_b/FILE_b have the same ino (123456 e.g)
> 3. Save ino in code below, make it sleep for stat command.
> ```
> @@ -3950,6 +3951,10 @@ static void handle_reply(struct ceph_mds_session *session, struct ceph_msg *msg)
>                         goto out_err;
>                 }
>                 req->r_target_inode = in;
> +               if (in->i_ino == 123456) {
> +                       pr_err("inode %lu found, ready to wait 10 seconds.\n", in->i_ino);
> +                       msleep(10000);
> +               }
> ```
> 4. echo 3 > /proc/sys/vm/drop_caches
> 5. in a shell, do `stat DIR_a/FILE_a`, we suppose to be stuck on this shell because of msleep() in handle_reply().
> 6. in the other shell, do `ls DIR_b/` to trigger ceph_readdir_prepopulate()
> 
> Repeat step 4 to step 6 for several times (5 times is enough I guess). And we'll see the deadlock.
> 

I am guessing... Is it possible to create some Ceph specific test-case in
xfstests suite? It will be great to have some test-case or unit-test for
checking this issue in the future.

OK. I suggest to add this reproduction path and other already shared
explanation/analysis into commit message and re-send the patch. Could you please
send the new version of the patch?

Thanks,
Slava.  

> 
> ________________________________________
> 发件人: Viacheslav Dubeyko <Slava.Dubeyko@....com>
> 发送时间: 2026年1月8日 3:59
> 收件人: 李磊
> 抄送: Alex Markuze; idryomov@...il.com; 孙朝; linux-kernel@...r.kernel.org; ceph-devel@...r.kernel.org
> 主题: Re:  答复:  答复: 【外部邮件!】Re:  [PATCH v2] ceph: fix deadlock in ceph_readdir_prepopulate
> 
> On Wed, 2026-01-07 at 16:01 +0000, 李磊 wrote:
> > Hi Slava,
> > 
> > This issue is very rare on our internal cephfs clusters. We had only encountered it for about three times.
> > But we are working on same hacking methods to speed up the reproduction. I think it will take me one week
> > if everything goes smoothly and I will share the methods here.
> > 
> > To be honest, this patch should be a revert patch of this one:
> > 
> > commit : bca9fc14c70fcbbebc84954cc39994e463fb9468
> > ceph: when filling trace, call ceph_get_inode outside of mutexes
> > 
> > I'll resend this patch later.
> 
> Sounds good. If I remember correctly, the main issue with the initial patch was
> the commit message that didn't have good explanation of the issue and why this
> revert can fix the issue. So, if we have all of these details in the commit
> message, then the patch should be in good shape.
> 
> Thanks,
> Slava.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ