linux-kernel - Re: XFS read hangs in 3.1-rc10

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111120153241.GA23511@infradead.org>
Date:	Sun, 20 Nov 2011 10:32:41 -0500
From:	Christoph Hellwig <hch@...radead.org>
To:	Simon Kirby <sim@...tway.ca>
Cc:	Christoph Hellwig <hch@...radead.org>,
	linux-kernel@...r.kernel.org, xfs@....sgi.com
Subject: Re: XFS read hangs in 3.1-rc10

On Wed, Nov 16, 2011 at 11:56:43AM -0800, Simon Kirby wrote:
> Sorry for the delay in testing.
> 
> Yes, everything looks fine even with the xfs_log_force line from your
> patch commented out. So, the changes in xfs_reclaim_inode() are just the
> set_bit(XBT_FORCE_FLUSH) and wake_up_process(), relative to 3.1.

Dave pointed out that we can do better than the big hammer, and the
patch below should fix your issue, too.  Can you test it?

---
From: Christoph Hellwig <hch@....de>
Subject: xfs: force buffer writeback before blocking on the ilock in inode reclaim

If we are doing synchronous inode reclaim we block the VM from making
progress in memory reclaim.  So if we encouter a flush locked inode
promote it in the delwri list and wake up xfsbufd to write it out now.
Without this we can get hangs of up to 30 seconds during workloads hitting
synchronous inode reclaim.

The scheme is copied from what we do for dquot reclaims.

Reported-by: Simon Kirby <sim@...tway.ca>
Signed-off-by: Christoph Hellwig <hch@....de>

Index: xfs/fs/xfs/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/xfs_sync.c	2011-11-20 12:48:36.664765032 +0100
+++ xfs/fs/xfs/xfs_sync.c	2011-11-20 13:51:55.594184465 +0100
@@ -770,6 +770,17 @@ restart:
 	if (!xfs_iflock_nowait(ip)) {
 		if (!(sync_mode & SYNC_WAIT))
 			goto out;
+
+		/*
+		 * If we only have a single dirty inode in a cluster there is
+		 * a fair chance that the AIL push may have pushed it into
+		 * the buffer, but xfsbufd won't touch it until 30 seconds
+		 * from now, and thus we will lock up here.
+		 *
+		 * Promote the inode buffer to the front of the delwri list
+		 * and wake up xfsbufd now.
+		 */
+		xfs_promote_inode(ip);
 		xfs_iflock(ip);
 	}
 
Index: xfs/fs/xfs/xfs_inode.c
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.c	2011-11-20 13:50:51.457865253 +0100
+++ xfs/fs/xfs/xfs_inode.c	2011-11-20 13:52:30.420662460 +0100
@@ -2835,6 +2835,27 @@ corrupt_out:
 	return XFS_ERROR(EFSCORRUPTED);
 }
 
+void
+xfs_promote_inode(
+	struct xfs_inode	*ip)
+{
+	struct xfs_buf		*bp;
+
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL|XFS_ILOCK_SHARED));
+
+	bp = xfs_incore(ip->i_mount->m_ddev_targp, ip->i_imap.im_blkno,
+			ip->i_imap.im_len, XBF_TRYLOCK);
+	if (!bp)
+		return;
+
+	if (XFS_BUF_ISDELAYWRITE(bp)) {
+		xfs_buf_delwri_promote(bp);
+		wake_up_process(ip->i_mount->m_ddev_targp->bt_task);
+	}
+
+	xfs_buf_relse(bp);
+}
+
 /*
  * Return a pointer to the extent record at file index idx.
  */
Index: xfs/fs/xfs/xfs_inode.h
===================================================================
--- xfs.orig/fs/xfs/xfs_inode.h	2011-11-20 13:50:51.487865091 +0100
+++ xfs/fs/xfs/xfs_inode.h	2011-11-20 13:51:39.224273148 +0100
@@ -498,6 +498,7 @@ int		xfs_iunlink(struct xfs_trans *, xfs
 void		xfs_iext_realloc(xfs_inode_t *, int, int);
 void		xfs_iunpin_wait(xfs_inode_t *);
 int		xfs_iflush(xfs_inode_t *, uint);
+void		xfs_promote_inode(struct xfs_inode *);
 void		xfs_lock_inodes(xfs_inode_t **, int, uint);
 void		xfs_lock_two_inodes(xfs_inode_t *, xfs_inode_t *, uint);
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/