[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200106215755.GB472651@magnolia>
Date: Mon, 6 Jan 2020 13:57:55 -0800
From: "Darrick J. Wong" <darrick.wong@...cle.com>
To: yu kuai <yukuai3@...wei.com>
Cc: bfoster@...hat.com, dchinner@...hat.com, sandeen@...deen.net,
cmaiolino@...hat.com, hch@....de, linux-xfs@...r.kernel.org,
linux-kernel@...r.kernel.org, zhengbin13@...wei.com,
yi.zhang@...wei.com, houtao1@...wei.com
Subject: Re: [PATCH 2/2] xfs: fix stale data exposure problem when punch
hole, collapse range or zero range across a delalloc extent
On Thu, Dec 26, 2019 at 09:47:21PM +0800, yu kuai wrote:
> In xfs_file_fallocate, when punch hole, zero range or collapse range is
> performed, xfs_fulsh_unmap_range() need to be called first. However,
> xfs_map_blocks will convert the whole extent to real, even if there are
> some blocks not related. Furthermore, the unrelated blocks will hold stale
> data since xfs_fulsh_unmap_range didn't flush the correspond dirty pages
> to disk.
>
> In this case, if user shutdown file system through xfsioctl with cmd
> 'XFS_IOC_GOINGDOWN' and arg 'XFS_FSOP_GOING_FLAGS_LOGFLUSH'. All the
> completed transactions will be flushed to disk, while dirty pages will
> never be flushed to disk. And after remount, the file will hold stale
> data.
Waitaminute, what problem are you trying to solve?
You have a file with a huge delalloc extent because we just wrote a
bunch of 'X' characters to part of a file:
---dddddddddddddddd
Then you want to fallocate or something in the middle of that:
---dddddddddddddddd
^^^^------ collapse range these blocks
So we xfs_flush_unmap_range to kill the pagecache on that range:
---dddddddddddddddd
^^^^------ xfs_flush_unmap_range()
This triggers writeback, which can convert the entire delalloc range to
a single extent:
---rrrrrrrrrrrrrrrr
^^^^^^^^-- This is the range we are writing back
^^^^^^^^---------- This range doesn't undergo writeback, but we wrote
the extent tree anyway
After committing that update to the log, the fs goes down, which leaves
us with the following after we reboot, mount, and recover the fs:
---rrrrrrrrrrrrrrrr
^^^^^^^^-- This part contains 'X'
^^^^^^^^---------- This range never underwent writeback, so it's full
of junk from the previous owner of the space
So your solution is to split the delalloc reservation to constrain the
allocation to the range that's being operated on?
If so, I think a better solution (at least from the perspective of
reducing fragmentation) would be to map the extent unwritten and force a
post-writeback conversion[1] but I got shot down for performance reasons
the last time I suggested that.
--D
[1] https://lore.kernel.org/linux-xfs/155259894630.30230.10064390935593758177.stgit@magnolia/
> Fix the problem by spliting delalloc extent before xfs_flush_unmap_range
> is called.
>
> Signed-off-by: yu kuai <yukuai3@...wei.com>
> ---
> fs/xfs/xfs_file.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 47 insertions(+)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index c93250108952..5398102feec9 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -786,6 +786,50 @@ xfs_break_layouts(
>
> return error;
> }
> +int
> +try_split_da_extent(
> + struct xfs_inode *ip,
> + loff_t offset,
> + loff_t len)
> +{
> + struct xfs_mount *mp = ip->i_mount;
> + xfs_fileoff_t start = XFS_B_TO_FSBT(mp, offset);
> + xfs_fileoff_t end = XFS_B_TO_FSBT(mp, offset + len - 1);
> + struct xfs_ifork *ifp = XFS_IFORK_PTR(ip, XFS_DATA_FORK);
> + struct xfs_iext_cursor cur;
> + struct xfs_bmbt_irec imap;
> + int error;
> +
> + /*
> + * if start belong to a delalloc extent and it's not the first block,
> + * split the extent at start.
> + */
> + if (xfs_iext_lookup_extent(ip, ifp, start, &cur, &imap) &&
> + imap.br_startblock != HOLESTARTBLOCK &&
> + isnullstartblock(imap.br_startblock) &&
> + start > imap.br_startoff) {
> + error = xfs_bmap_split_da_extent(ip, start);
> + if (error)
> + return error;
> + ip->i_d.di_nextents--;
> + }
> +
> + /*
> + * if end + 1 belong to a delalloc extent and it's not the first block,
> + * split the extent at end + 1.
> + */
> + if (xfs_iext_lookup_extent(ip, ifp, end + 1, &cur, &imap) &&
> + imap.br_startblock != HOLESTARTBLOCK &&
> + isnullstartblock(imap.br_startblock) &&
> + end + 1 > imap.br_startoff) {
> + error = xfs_bmap_split_da_extent(ip, end + 1);
> + if (error)
> + return error;
> + ip->i_d.di_nextents--;
> + }
> +
> + return 0;
> +}
>
> #define XFS_FALLOC_FL_SUPPORTED \
> (FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE | \
> @@ -842,6 +886,9 @@ xfs_file_fallocate(
> */
> if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE |
> FALLOC_FL_COLLAPSE_RANGE)) {
> + error = try_split_da_extent(ip, offset, len);
> + if (error)
> + goto out_unlock;
> error = xfs_flush_unmap_range(ip, offset, len);
> if (error)
> goto out_unlock;
> --
> 2.17.2
>
Powered by blists - more mailing lists