linux-kernel - Re: spurious -ENOSPC on XFS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090120232422.GF10158@disturbed>
Date:	Wed, 21 Jan 2009 10:24:22 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Mikulas Patocka <mpatocka@...hat.com>
Cc:	Christoph Hellwig <hch@...radead.org>, xfs@....sgi.com,
	linux-kernel@...r.kernel.org
Subject: Re: spurious -ENOSPC on XFS

On Tue, Jan 20, 2009 at 02:38:27PM -0500, Mikulas Patocka wrote:
> 
> 
> On Sun, 18 Jan 2009, Christoph Hellwig wrote:
> 
> > On Tue, Jan 13, 2009 at 11:28:58PM -0500, Mikulas Patocka wrote:
> > > The result must not depend on magic timer values. If it does, you end up 
> > > with undebbugable nondeterministic failures.
> > > 
> > > Why don't you change that 500ms wait to "wait until the flush finishes"? 
> > > That would be correct.
> > 
> > Yes, this probably would better.  Could I motivate you to come up with
> > a patch for that?
> > 
> 
> Hi
> 
> I looked at the source and found out that it uses sync_blockdev for 
> syncing --- but sync_blockdev writes only metadata buffers, it doesn't 
> touch inodes and pages and doesn't resolve delayed allocations. So it 
> really doesn't sync anything.

Ah, bugger. Thanks for finding this.

> I replaced it with correct syncing of all inodes. With this patch it 
> passes my testcase (no more spurious -ENOSPCs), but it still isn't 
> correct, there is that 500ms delay --- if the machine was so overloaded 
> that it couldn't sync withing 500ms, you still get spurious -ENOSPC.

That's VFS level data syncing - there may be other XFS level stuff
that can be dones as well (e.g. cleanup/truncate of unlinked inodes)
that will release space.

> There are notions about possible deadlocks (the syncer may lock against 
> the process that is waiting for the sync to finish), that's why removing 
> that 500ms delay isn't that easy as it seems. I don't have XFS knowledge 
> to check for the deadlocks, it should be done by XFS developers. Also, 
> when you resolve the deadlocks and drop the timeout, replace WB_SYNC_NONE 
> with WB_SYNC_ALL in this patch.

Right, so you need to use internal xfs sync functions that don't
have these problems. That is:

	error = xfs_sync_inodes(ip->i_mount, SYNC_DELWRI|SYNC_WAIT);

will do a blocking flush of all the inodes without deadlocks occurring.
Then you can remove the 500ms wait.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/