linux-kernel - [PATCH] xfs: avoid redundant AGFL buffer invalidation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20240527061006.4045908-1-hsiangkao@linux.alibaba.com>
Date: Mon, 27 May 2024 14:10:06 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: linux-xfs@...r.kernel.org,
	Dave Chinner <david@...morbit.com>,
	"Darrick J . Wong" <djwong@...nel.org>,
	Chandan Babu R <chandanbabu@...nel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
	Gao Xiang <hsiangkao@...ux.alibaba.com>
Subject: [PATCH] xfs: avoid redundant AGFL buffer invalidation

Currently AGFL blocks can be filled from the following three sources:
 - allocbt free blocks, as in xfs_allocbt_free_block();
 - rmapbt free blocks, as in xfs_rmapbt_free_block();
 - refilled from freespace btrees, as in xfs_alloc_fix_freelist().

Originally, allocbt free blocks would be marked as stale only when they
put back in the general free space pool as Dave mentioned on IRC, "we
don't stale AGF metadata btree blocks when they are returned to the
AGFL .. but once they get put back in the general free space pool, we
have to make sure the buffers are marked stale as the next user of
those blocks might be user data...."

However, after commit ca250b1b3d71 ("xfs: invalidate allocbt blocks
moved to the free list") and commit edfd9dd54921 ("xfs: move buffer
invalidation to xfs_btree_free_block"), even allocbt / bmapbt free
blocks will be invalidated immediately since they may fail to pass
V5 format validation on writeback even writeback to free space would be
safe.

IOWs, IMHO currently there is actually no difference of free blocks
between AGFL freespace pool and the general free space pool.  So let's
avoid extra redundant AGFL buffer invalidation, since otherwise we're
currently facing unnecessary xfs_log_force() due to xfs_trans_binval()
again on buffers already marked as stale before as below:

[  333.507469] Call Trace:
[  333.507862]  xfs_buf_find+0x371/0x6a0
[  333.508451]  xfs_buf_get_map+0x3f/0x230
[  333.509062]  xfs_trans_get_buf_map+0x11a/0x280
[  333.509751]  xfs_free_agfl_block+0xa1/0xd0
[  333.510403]  xfs_agfl_free_finish_item+0x16e/0x1d0
[  333.511157]  xfs_defer_finish_noroll+0x1ef/0x5c0
[  333.511871]  xfs_defer_finish+0xc/0xa0
[  333.512471]  xfs_itruncate_extents_flags+0x18a/0x5e0
[  333.513253]  xfs_inactive_truncate+0xb8/0x130
[  333.513930]  xfs_inactive+0x223/0x270

And xfs_log_force() will take tens of milliseconds with AGF buffer
locked.  It becomes an unnecessary long latency especially on our PMEM
devices with FSDAX enabled.  Also fstests are passed with this patch.

Signed-off-by: Gao Xiang <hsiangkao@...ux.alibaba.com>
---
 fs/xfs/libxfs/xfs_alloc.c | 18 ++----------------
 1 file changed, 2 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
index 6cb8b2ddc541..a80d2a31252a 100644
--- a/fs/xfs/libxfs/xfs_alloc.c
+++ b/fs/xfs/libxfs/xfs_alloc.c
@@ -2432,22 +2432,8 @@ xfs_free_agfl_block(
 	struct xfs_buf		*agbp,
 	struct xfs_owner_info	*oinfo)
 {
-	int			error;
-	struct xfs_buf		*bp;
-
-	error = xfs_free_ag_extent(tp, agbp, agno, agbno, 1, oinfo,
-				   XFS_AG_RESV_AGFL);
-	if (error)
-		return error;
-
-	error = xfs_trans_get_buf(tp, tp->t_mountp->m_ddev_targp,
-			XFS_AGB_TO_DADDR(tp->t_mountp, agno, agbno),
-			tp->t_mountp->m_bsize, 0, &bp);
-	if (error)
-		return error;
-	xfs_trans_binval(tp, bp);
-
-	return 0;
+	return xfs_free_ag_extent(tp, agbp, agno, agbno, 1, oinfo,
+				  XFS_AG_RESV_AGFL);
 }

 /*
-- 
2.39.3