linux-kernel - Re: XFS read hangs in 3.1-rc10

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111025200748.GA25043@hostway.ca>
Date:	Tue, 25 Oct 2011 13:07:48 -0700
From:	Simon Kirby <sim@...tway.ca>
To:	Christoph Hellwig <hch@...radead.org>
Cc:	linux-kernel@...r.kernel.org, xfs@....sgi.com
Subject: Re: XFS read hangs in 3.1-rc10

On Mon, Oct 24, 2011 at 04:22:19AM -0400, Christoph Hellwig wrote:

> On Fri, Oct 21, 2011 at 01:28:57PM -0700, Simon Kirby wrote:
> > > So we're waiting for the inode to be flushed, aka I/O again.
> > 
> > But I don't seem to see any queued I/O, hmm.
> 
> Well, as far as XFS is concerned the inode is beeing flushed and
> the buffer is locked.  It could be stuck in the XFS internal delwri
> list because a buffer for example is pinned.
> 
> If that is the case the big hammer patch I attached below - probably
> not the final issue, but it should fix the hang if that is the case.
> 
> > > If this doesn't help I'll probably need to come up with some tracing
> > > patches for you.
> > 
> > It seemes 3.0.7+gregkh's stable-queue queue-3.0 patches seems to be
> > running fine without blocking at all on this SSD box, so that should
> > narrow it down significantly.
> > 
> > Hmm, looking at git diff --stat v3.0.7..v3.1-rc10 fs/xfs , maybe not.. :)
> > 
> > Maybe 3.1 fs/xfs would transplant into 3.0 or vice-versa?
> 
> If the patch above doesn't work I'll prepare a backport for you.
> 
> Index: linux-2.6/fs/xfs/xfs_sync.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_sync.c	2011-10-24 10:02:27.361971264 +0200
> +++ linux-2.6/fs/xfs/xfs_sync.c	2011-10-24 10:11:03.301036954 +0200
> @@ -764,7 +764,8 @@ xfs_reclaim_inode(
>  	struct xfs_perag	*pag,
>  	int			sync_mode)
>  {
> -	int	error;
> +	struct xfs_mount	*mp = ip->i_mount;
> +	int			error;
>  
>  restart:
>  	error = 0;
> @@ -772,6 +773,18 @@ restart:
>  	if (!xfs_iflock_nowait(ip)) {
>  		if (!(sync_mode & SYNC_WAIT))
>  			goto out;
> +
> +		/*
> +		 * If the inode is flush locked we probably had someone else
> +		 * push it to the buffer and the buffer is now sitting in
> +		 * the delwri list.
> +		 *
> +		 * Use the big hammer to force it.
> +		 */
> +		xfs_log_force(mp, XFS_LOG_SYNC);
> +		set_bit(XBT_FORCE_FLUSH, &mp->m_ddev_targp->bt_flags);
> +		wake_up_process(mp->m_ddev_targp->bt_task);
> +
>  		xfs_iflock(ip);
>  	}
>  

This patch seems to work, at least on an SSD box. No more hung task
warnings, and everything appears normal.

Do we know what caused this regression and/or how to fix it without the
big hammer, or do we need to break it down further?

Thanks!

Simon-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/