lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240604071121.3981686-1-wozizhi@huawei.com>
Date: Tue, 4 Jun 2024 15:11:21 +0800
From: Zizhi Wo <wozizhi@...wei.com>
To: <chandan.babu@...cle.com>, <djwong@...nel.org>, <dchinner@...hat.com>,
	<wozizhi@...wei.com>
CC: <linux-xfs@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<yangerkun@...wei.com>
Subject: [PATCH] xfs: Fix file creation failure

We have an xfs image that contains only 2 AGs, the first AG is full and
the second AG is empty, then a concurrent file creation and little writing
could unexpectedly return -ENOSPC error since there is a race window that
the allocator could get the wrong agf->agf_longest.

Write file process steps:
1) Find the entry that best meets the conditions, then calculate the start
address and length of the remaining part of the entry after allocation.
2) Delete this entry. Because the second AG is empty, the btree in its agf
has only one record, and agf->agf_longest will be set to 0 after deletion.
3) Insert the remaining unused parts of this entry based on the
calculations in 1), and update the agf->agf_longest.

Create file process steps:
1) Check whether there are free inodes in the inode chunk.
2) If there is no free inode, check whether there has space for creating
inode chunks, perform the no-lock judgment first.
3) If the judgment succeeds, the judgment is performed again with agf lock
held. Otherwire, an error is returned directly.

If the write process is in step 2) but not go to 3) yet, the create file
process goes to 2) at this time, it will be mistaken for no space,
resulting in the file system still has space but the file creation fails.

	Direct write				Create file
xfs_file_write_iter
 ...
 xfs_direct_write_iomap_begin
  xfs_iomap_write_direct
   ...
   xfs_alloc_ag_vextent_near
    xfs_alloc_cur_finish
     xfs_alloc_fixup_trees
      xfs_btree_delete
       xfs_btree_delrec
	xfs_allocbt_update_lastrec
	// longest = 0 because numrec == 0.
	 agf->agf_longest = len = 0
					   xfs_create
					    ...
					     xfs_dialloc
					      ...
					      xfs_alloc_fix_freelist
					       xfs_alloc_space_available
					-> as longest=0, it will return
					false, no space for inode alloc.

Fix this issue by adding the bc_free_longest field to the xfs_btree_cur_t
structure to store the potential longest count that will be updated. The
assignment is done in xfs_alloc_fixup_trees() and xfs_free_ag_extent().

Reported by: Ye Bin <yebin10@...wei.com>
Signed-off-by: Zizhi Wo <wozizhi@...wei.com>
---
 fs/xfs/libxfs/xfs_alloc.c       | 14 ++++++++++++++
 fs/xfs/libxfs/xfs_alloc_btree.c |  9 ++++++++-
 fs/xfs/libxfs/xfs_btree.h       |  1 +
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 6c55a6e88eba..86ba873d57a8 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -577,6 +577,13 @@ xfs_alloc_fixup_trees(
 		nfbno2 = rbno + rlen;
 		nflen2 = (fbno + flen) - nfbno2;
 	}
+
+	/*
+	 * Record the potential maximum free length in advance.
+	 */
+	if (nfbno1 != NULLAGBLOCK || nfbno2 != NULLAGBLOCK)
+		cnt_cur->bc_ag.bc_free_longest = XFS_EXTLEN_MAX(nflen1, nflen2);
+
 	/*
 	 * Delete the entry from the by-size btree.
 	 */
@@ -2044,6 +2051,13 @@ xfs_free_ag_extent(
 	 * Now allocate and initialize a cursor for the by-size tree.
 	 */
 	cnt_cur = xfs_cntbt_init_cursor(mp, tp, agbp, pag);
+	/*
+	 * Record the potential maximum free length in advance.
+	 */
+	if (haveleft)
+		cnt_cur->bc_ag.bc_free_longest = ltlen;
+	if (haveright)
+		cnt_cur->bc_ag.bc_free_longest = gtlen;
 	/*
 	 * Have both left and right contiguous neighbors.
 	 * Merge all three into a single free block.
diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c
index 6ef5ddd89600..8e7d1e0f1a63 100644
--- a/fs/xfs/libxfs/xfs_alloc_btree.c
+++ b/fs/xfs/libxfs/xfs_alloc_btree.c
@@ -161,7 +161,14 @@ xfs_allocbt_update_lastrec(
 			rrp = XFS_ALLOC_REC_ADDR(cur->bc_mp, block, numrecs);
 			len = rrp->ar_blockcount;
 		} else {
-			len = 0;
+			/*
+			 * Update in advance to prevent file creation failure
+			 * for concurrent processes even though there is no
+			 * numrec currently.
+			 * And there's no need to worry as the value that no
+			 * less than bc_free_longest will be inserted later.
+			 */
+			len = cpu_to_be32(cur->bc_ag.bc_free_longest);
 		}
 
 		break;
diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h
index f93374278aa1..985b1885a643 100644
--- a/fs/xfs/libxfs/xfs_btree.h
+++ b/fs/xfs/libxfs/xfs_btree.h
@@ -281,6 +281,7 @@ struct xfs_btree_cur
 			struct xfs_perag	*pag;
 			struct xfs_buf		*agbp;
 			struct xbtree_afakeroot	*afake;	/* for staging cursor */
+			xfs_extlen_t		bc_free_longest; /* potential longest free space */
 		} bc_ag;
 		struct {
 			struct xfbtree		*xfbtree;
-- 
2.39.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ