lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240529095206.2568162-9-yi.zhang@huaweicloud.com>
Date: Wed, 29 May 2024 17:52:06 +0800
From: Zhang Yi <yi.zhang@...weicloud.com>
To: linux-xfs@...r.kernel.org,
	linux-fsdevel@...r.kernel.org
Cc: linux-kernel@...r.kernel.org,
	djwong@...nel.org,
	hch@...radead.org,
	brauner@...nel.org,
	david@...morbit.com,
	chandanbabu@...nel.org,
	jack@...e.cz,
	willy@...radead.org,
	yi.zhang@...wei.com,
	yi.zhang@...weicloud.com,
	chengzhihao1@...wei.com,
	yukuai3@...wei.com
Subject: [RFC PATCH v4 8/8] xfs: improve truncate on a realtime inode with huge extsize

From: Zhang Yi <yi.zhang@...wei.com>

If we truncate down a realtime inode which extsize is too large, zeroing
out the entire aligned EOF extent could be very slow. Fortunately,
__xfs_bunmapi() would align the unmapped range to rtextsize, split and
convert the extra blocks to unwritten state. So, adjust the blocksize to
the filesystem blocksize if the rtextsize is large enough, let
__xfs_bunmapi() to convert the tail blocks to unwritten, this could
improve the truncate performance significantly.

 # mkfs.xfs -f -rrtdev=/dev/pmem1s -f -m reflink=0,rmapbt=0, \
            -d rtinherit=1 -r extsize=1G /dev/pmem2s
 # for i in {1..5}; \
   do dd if=/dev/zero of=/mnt/scratch/$i bs=1M count=1024; done
 # sync
 # time for i in {1..5}; \
   do xfs_io -c "truncate 4k" /mnt/scratch/$i; done

Before:
 real    0m16.762s
 user    0m0.008s
 sys     0m16.750s

After:
 real    0m0.076s
 user    0m0.010s
 sys     0m0.069s

Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
---
 fs/xfs/xfs_inode.c |  2 +-
 fs/xfs/xfs_inode.h | 12 ++++++++++++
 fs/xfs/xfs_iops.c  |  9 +++++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index db35167acef6..c0c1ab310aae 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1513,7 +1513,7 @@ xfs_itruncate_extents_flags(
 	 * the page cache can't scale that far.
 	 */
 	first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size);
-	if (xfs_inode_has_bigrtalloc(ip))
+	if (xfs_inode_has_bigrtalloc(ip) && !xfs_inode_has_hugertalloc(ip))
 		first_unmap_block = xfs_rtb_roundup_rtx(mp, first_unmap_block);
 	if (!xfs_verify_fileoff(mp, first_unmap_block)) {
 		WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF);
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 292b90b5f2ac..4eed5b0c57c0 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -320,6 +320,18 @@ static inline bool xfs_inode_has_bigrtalloc(struct xfs_inode *ip)
 	return XFS_IS_REALTIME_INODE(ip) && ip->i_mount->m_sb.sb_rextsize > 1;
 }
 
+/*
+ * Decide if this file is a realtime file whose data allocation unit is larger
+ * than default.
+ */
+static inline bool xfs_inode_has_hugertalloc(struct xfs_inode *ip)
+{
+	struct xfs_mount *mp = ip->i_mount;
+
+	return XFS_IS_REALTIME_INODE(ip) &&
+	       mp->m_sb.sb_rextsize > XFS_B_TO_FSB(mp, XFS_DFL_RTEXTSIZE);
+}
+
 /*
  * Return the buftarg used for data allocations on a given inode.
  */
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index c53de5e6ef66..d5fc84e5a37c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -870,6 +870,15 @@ xfs_setattr_size(
 	if (newsize < oldsize) {
 		unsigned int blocksize = xfs_inode_alloc_unitsize(ip);
 
+		/*
+		 * If the extsize is too large on a realtime inode, zeroing
+		 * out the entire aligned EOF extent could be slow, adjust the
+		 * blocksize to the filesystem blocksize, let __xfs_bunmapi()
+		 * to convert the tail blocks to unwritten.
+		 */
+		if (xfs_inode_has_hugertalloc(ip))
+			blocksize = i_blocksize(inode);
+
 		/*
 		 * Zeroing out the partial EOF block and the rest of the extra
 		 * aligned blocks on a downward truncate.
-- 
2.39.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ