[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZoIDVHaS8xjha1mA@dread.disaster.area>
Date: Mon, 1 Jul 2024 11:16:05 +1000
From: Dave Chinner <david@...morbit.com>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-xfs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, djwong@...nel.org, hch@...radead.org,
brauner@...nel.org, chandanbabu@...nel.org, jack@...e.cz,
yi.zhang@...wei.com, chengzhihao1@...wei.com, yukuai3@...wei.com
Subject: Re: [PATCH -next v6 1/2] xfs: reserve blocks for truncating large
realtime inode
On Tue, Jun 18, 2024 at 10:21:11PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
>
> When unaligned truncate down a big realtime file, xfs_truncate_page()
> only zeros out the tail EOF block, __xfs_bunmapi() should split the tail
> written extent and convert the later one that beyond EOF block to
> unwritten, but it couldn't work as expected now since the reserved block
> is zero in xfs_setattr_size(), this could expose stale data just after
> commit '943bc0882ceb ("iomap: don't increase i_size if it's not a write
> operation")'.
>
> If we truncate file that contains a large enough written extent:
>
> |< rxext >|< rtext >|
> ...WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
> ^ (new EOF) ^ old EOF
>
> Since we only zeros out the tail of the EOF block, and
> xfs_itruncate_extents()->..->__xfs_bunmapi() unmap the whole ailgned
> extents, it becomes this state:
>
> |< rxext >|
> ...WWWzWWWWWWWWWWWWW
> ^ new EOF
>
> Then if we do an extending write like this, the blocks in the previous
> tail extent becomes stale:
>
> |< rxext >|
> ...WWWzSSSSSSSSSSSSS..........WWWWWWWWWWWWWWWWW
> ^ old EOF ^ append start ^ new EOF
>
> Fix this by reserving XFS_DIOSTRAT_SPACE_RES blocks for big realtime
> inode.
This same problem is going to happen with force aligned allocations,
right? i.e. it is a result of having a allocation block size larger
than one filesystem block?
If so, then....
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> Reviewed-by: Christoph Hellwig <hch@....de>
> ---
> fs/xfs/xfs_iops.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index ff222827e550..a00dcbc77e12 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -17,6 +17,8 @@
> #include "xfs_da_btree.h"
> #include "xfs_attr.h"
> #include "xfs_trans.h"
> +#include "xfs_trans_space.h"
> +#include "xfs_bmap_btree.h"
> #include "xfs_trace.h"
> #include "xfs_icache.h"
> #include "xfs_symlink.h"
> @@ -811,6 +813,7 @@ xfs_setattr_size(
> struct xfs_trans *tp;
> int error;
> uint lock_flags = 0;
> + uint resblks = 0;
> bool did_zeroing = false;
>
> xfs_assert_ilocked(ip, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL);
> @@ -917,7 +920,17 @@ xfs_setattr_size(
> return error;
> }
>
> - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
> + /*
> + * For realtime inode with more than one block rtextsize, we need the
> + * block reservation for bmap btree block allocations/splits that can
> + * happen since it could split the tail written extent and convert the
> + * right beyond EOF one to unwritten.
> + */
> + if (xfs_inode_has_bigrtalloc(ip))
> + resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
.... should this be doing this generic check instead:
if (xfs_inode_alloc_unitsize(ip) > 1)
resblks = XFS_DIOSTRAT_SPACE_RES(mp, 0);
-Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists