[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240618142112.1315279-2-yi.zhang@huaweicloud.com>
Date: Tue, 18 Jun 2024 22:21:11 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: linux-xfs@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Cc: linux-kernel@...r.kernel.org,
djwong@...nel.org,
hch@...radead.org,
brauner@...nel.org,
david@...morbit.com,
chandanbabu@...nel.org,
jack@...e.cz,
yi.zhang@...wei.com,
yi.zhang@...weicloud.com,
chengzhihao1@...wei.com,
yukuai3@...wei.com
Subject: [PATCH -next v6 1/2] xfs: reserve blocks for truncating large realtime inode
From: Zhang Yi <yi.zhang@...wei.com>
When unaligned truncate down a big realtime file, xfs_truncate_page()
only zeros out the tail EOF block, __xfs_bunmapi() should split the tail
written extent and convert the later one that beyond EOF block to
unwritten, but it couldn't work as expected now since the reserved block
is zero in xfs_setattr_size(), this could expose stale data just after
commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write
operation")'.
If we truncate file that contains a large enough written extent:
|< rxext >|< rtext >|
...WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
^ (new EOF) ^ old EOF
Since we only zeros out the tail of the EOF block, and
xfs_itruncate_extents()->..->__xfs_bunmapi() unmap the whole ailgned
extents, it becomes this state:
|< rxext >|
...WWWzWWWWWWWWWWWWW
^ new EOF
Then if we do an extending write like this, the blocks in the previous
tail extent becomes stale:
|< rxext >|
...WWWzSSSSSSSSSSSSS..........WWWWWWWWWWWWWWWWW
^ old EOF ^ append start ^ new EOF
Fix this by reserving XFS_DIOSTRAT_SPACE_RES blocks for big realtime
inode.
Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
Reviewed-by: Christoph Hellwig <hch@....de>
---
fs/xfs/xfs_iops.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index ff222827e550..a00dcbc77e12 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -17,6 +17,8 @@
#include "xfs_da_btree.h"
#include "xfs_attr.h"
#include "xfs_trans.h"
+#include "xfs_trans_space.h"
+#include "xfs_bmap_btree.h"
#include "xfs_trace.h"
#include "xfs_icache.h"
#include "xfs_symlink.h"
@@ -811,6 +813,7 @@ xfs_setattr_size(
struct xfs_trans *tp;
int error;
uint lock_flags = 0;
+ uint resblks = 0;
bool did_zeroing = false;
xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL);
@@ -917,7 +920,17 @@ xfs_setattr_size(
return error;
}
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
+ /*
+ * For realtime inode with more than one block rtextsize, we need the
+ * block reservation for bmap btree block allocations/splits that can
+ * happen since it could split the tail written extent and convert the
+ * right beyond EOF one to unwritten.
+ */
+ if (xfs_inode_has_bigrtalloc(ip))
+ resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
+
+ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, resblks,
+ 0, 0, &tp);
if (error)
return error;
--
2.39.2
Powered by blists - more mailing lists