lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAO8a2Si4B3AqgfmD4-ngsSv0W-7fn_-Qm46SR4uQQ2ciUX3q2A@mail.gmail.com>
Date: Wed, 30 Jul 2025 18:01:56 +0400
From: Alex Markuze <amarkuze@...hat.com>
To: Zhao Sun <sunzhao03@...ishou.com>
Cc: xiubli@...hat.com, idryomov@...il.com, ceph-devel@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] ceph: fix deadlock in ceph_readdir_prepopulate

I've taken another look, there is a much bigger issue here I'm not
entirely convinced its safe to release and re-acquire the lock:

The original design ensures that all snap context operations, both
reading and writing, happen under the same lock, guaranteeing
consistency. The patch breaks this invariant by creating a window
where snap contexts can change between the inode creation and the
subsequent ceph_fill_inode call.
Bottom line: The patch trades deadlock prevention for potential snap
context inconsistency, which could lead to data corruption or
incorrect file visibility in snapshots.

On Wed, Jul 30, 2025 at 4:45 PM Alex Markuze <amarkuze@...hat.com> wrote:
>
> Hi,
>
> The patch correctly identifies and addresses the deadlock by releasing
> snap_rwsem before the blocking ceph_get_inode call.
>
> My Recommendation:
>
> Use down_read_killable: The current patch uses
> down_read(&mdsc->snap_rwsem) after calling ceph_get_inode.
> This is problematic because down_read is uninterruptible. If a signal
> (like SIGKILL or Ctrl+C) is sent to the process while it's waiting to
> re-acquire the snap_rwsem read lock (e.g., if another thread holds the
> write lock), the process will hang indefinitely and cannot be killed.
> Using down_read_killable(&mdsc->snap_rwsem) instead allows the lock
> acquisition to be interrupted by signals. If it fails (returns
> -EINTR), the error must be handled properly (e.g., release the inode
> reference obtained from ceph_get_inode and propagate the error) to
> ensure the process remains killable and doesn't hang.
>
> This change is essential to prevent potential system hangs.
>
> Best regards,
>
> On Wed, Jul 30, 2025 at 1:49 PM Zhao Sun <sunzhao03@...ishou.com> wrote:
> >
> > When ceph_readdir_prepopulate calls ceph_get_inode while holding
> > mdsc->snap_rwsem, a deadlock may occur, blocking all subsequent
> > requests of the current session.
> >
> > Fix by release the mds->snap_rwsem read lock before calling the
> > ceph_get_inode function.
> >
> > Link: https://tracker.ceph.com/issues/72307
> > Signed-off-by: Zhao Sun <sunzhao03@...ishou.com>
> > ---
> >  fs/ceph/inode.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/ceph/inode.c b/fs/ceph/inode.c
> > index 06cd2963e41e..3d7fb045ba76 100644
> > --- a/fs/ceph/inode.c
> > +++ b/fs/ceph/inode.c
> > @@ -1900,6 +1900,7 @@ static int fill_readdir_cache(struct inode *dir, struct dentry *dn,
> >  int ceph_readdir_prepopulate(struct ceph_mds_request *req,
> >                              struct ceph_mds_session *session)
> >  {
> > +       struct ceph_mds_client *mdsc = session->s_mdsc;
> >         struct dentry *parent = req->r_dentry;
> >         struct inode *inode = d_inode(parent);
> >         struct ceph_inode_info *ci = ceph_inode(inode);
> > @@ -2029,7 +2030,10 @@ int ceph_readdir_prepopulate(struct ceph_mds_request *req,
> >                 if (d_really_is_positive(dn)) {
> >                         in = d_inode(dn);
> >                 } else {
> > +                       /* Release mdsc->snap_rwsem in advance to avoid deadlock */
> > +                       up_read(&mdsc->snap_rwsem);
> >                         in = ceph_get_inode(parent->d_sb, tvino, NULL);
> > +                       down_read(&mdsc->snap_rwsem);
> >                         if (IS_ERR(in)) {
> >                                 doutc(cl, "new_inode badness\n");
> >                                 d_drop(dn);
> > --
> > 2.39.2 (Apple Git-143)
> >
> >


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ