lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230824092619.1327976-7-yi.zhang@huaweicloud.com>
Date:   Thu, 24 Aug 2023 17:26:09 +0800
From:   Zhang Yi <yi.zhang@...weicloud.com>
To:     linux-ext4@...r.kernel.org
Cc:     tytso@....edu, adilger.kernel@...ger.ca, jack@...e.cz,
        yi.zhang@...wei.com, yi.zhang@...weicloud.com,
        chengzhihao1@...wei.com, yukuai3@...wei.com
Subject: [RFC PATCH 06/16] ext4: move delalloc data reserve spcae updating into ext4_es_insert_extent()

From: Zhang Yi <yi.zhang@...wei.com>

We update data reserved space for delalloc after allocating new blocks
in ext4_{ind|ext}_map_blocks(). If bigalloc feature is enabled, we also
need to query the extents_status tree to calculate the exact reserved
clusters. If we move it to ext4_es_insert_extent(), just after dropping
delalloc extents_status entry, it could become more simple because
__es_remove_extent() has done most of the work and we could remove
entire ext4_es_delayed_clu().

One important thing needs to take care of is that if bigalloc is
enabled, we should update data reserved count when first converting some
of the delayed only es entries of a caluster which has many other
delayed only entries left over.

  |                   one cluster                        |
  --------------------------------------------------------
  | da es 0 | .. | da es 1 | .. | da es 2 | .. | da es 3 |
  --------------------------------------------------------
  ^         ^
  |         | <- first allocating this delayed extent

The later allocations in that cluster will not count again. We could do
this by counting the new inserts pending clusters.

Another important thing is the quota claiming and i_blocks count, if
the delayed allocating has been raced by another no-delay allocating
(from fallocate, filemap, DIO...), we cannot claim quota as usual
because the racer have already done it. We could distinguish this case
easily through checking EXTENT_STATUS_DELAYED and the reserved only
blocks counted by __es_remove_extent(). If the EXTENT_STATUS_DELAYED is
set, it always means that the allocating is not from the delayed
allocating. But on the contrary, we can only get the opposite
conclusion if bigalloc is not enabled. If bigalloc is enabled, it could
be raced by another fallocate which is writing to other non-delayed
areas of the same cluster. In this case, the EXTENT_STATUS_DELAYED is
not set but we cannot claim quota again.

  |             one cluster                 |
  -------------------------------------------
  |                            | delayed es |
  -------------------------------------------
  ^           ^
  | fallocate |

So we also need to check the counted reserved only blocks, if it is zero
it means that the allocating is not from the delayed allocating, and we
should release reserved qutoa instead of claim it.

Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
---
 fs/ext4/extents.c        |  37 -------------
 fs/ext4/extents_status.c | 115 +++++++++------------------------------
 fs/ext4/extents_status.h |   2 -
 fs/ext4/indirect.c       |   7 ---
 fs/ext4/inode.c          |   5 +-
 5 files changed, 30 insertions(+), 136 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e4115d338f10..592383effe80 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4323,43 +4323,6 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 		goto out;
 	}
 
-	/*
-	 * Reduce the reserved cluster count to reflect successful deferred
-	 * allocation of delayed allocated clusters or direct allocation of
-	 * clusters discovered to be delayed allocated.  Once allocated, a
-	 * cluster is not included in the reserved count.
-	 */
-	if (test_opt(inode->i_sb, DELALLOC) && allocated_clusters) {
-		if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
-			/*
-			 * When allocating delayed allocated clusters, simply
-			 * reduce the reserved cluster count and claim quota
-			 */
-			ext4_da_update_reserve_space(inode, allocated_clusters,
-							1);
-		} else {
-			ext4_lblk_t lblk, len;
-			unsigned int n;
-
-			/*
-			 * When allocating non-delayed allocated clusters
-			 * (from fallocate, filemap, DIO, or clusters
-			 * allocated when delalloc has been disabled by
-			 * ext4_nonda_switch), reduce the reserved cluster
-			 * count by the number of allocated clusters that
-			 * have previously been delayed allocated.  Quota
-			 * has been claimed by ext4_mb_new_blocks() above,
-			 * so release the quota reservations made for any
-			 * previously delayed allocated clusters.
-			 */
-			lblk = EXT4_LBLK_CMASK(sbi, map->m_lblk);
-			len = allocated_clusters << sbi->s_cluster_bits;
-			n = ext4_es_delayed_clu(inode, lblk, len);
-			if (n > 0)
-				ext4_da_update_reserve_space(inode, (int) n, 0);
-		}
-	}
-
 	/*
 	 * Cache the extent and update transaction to commit on fdatasync only
 	 * when it is _not_ an unwritten extent.
diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c
index 62191c772b82..34164c2827f2 100644
--- a/fs/ext4/extents_status.c
+++ b/fs/ext4/extents_status.c
@@ -856,11 +856,14 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 	struct extent_status newes;
 	ext4_lblk_t end = lblk + len - 1;
 	int err1 = 0, err2 = 0, err3 = 0;
+	struct rsvd_info rinfo;
+	int pending = 0;
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	struct extent_status *es1 = NULL;
 	struct extent_status *es2 = NULL;
 	struct pending_reservation *pr = NULL;
 	bool revise_pending = false;
+	bool delayed = false;
 
 	if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY)
 		return;
@@ -878,6 +881,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 	 * data lose, and the extent has been written, it's safe to remove
 	 * the delayed flag even it's still delayed.
 	 */
+	delayed = status & EXTENT_STATUS_DELAYED;
 	if ((status & EXTENT_STATUS_DELAYED) &&
 	    (status & EXTENT_STATUS_WRITTEN))
 		status &= ~EXTENT_STATUS_DELAYED;
@@ -902,7 +906,7 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 		pr = __alloc_pending(true);
 	write_lock(&EXT4_I(inode)->i_es_lock);
 
-	err1 = __es_remove_extent(inode, lblk, end, NULL, es1);
+	err1 = __es_remove_extent(inode, lblk, end, &rinfo, es1);
 	if (err1 != 0)
 		goto error;
 	/* Free preallocated extent if it didn't get used. */
@@ -932,9 +936,30 @@ void ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk,
 			__free_pending(pr);
 			pr = NULL;
 		}
+		/*
+		 * In the first partial allocating some delayed extents of
+		 * one cluster, we also need to count the data cluster when
+		 * allocating delay only extent entries.
+		 */
+		pending = err3;
 	}
 error:
 	write_unlock(&EXT4_I(inode)->i_es_lock);
+	/*
+	 * If EXTENT_STATUS_DELAYED is not set and delayed only blocks is
+	 * not zero, we are allocating delayed allocated clusters, simply
+	 * reduce the reserved cluster count and claim quota.
+	 *
+	 * Otherwise, we aren't allocating delayed allocated clusters
+	 * (from fallocate, filemap, DIO, or clusters allocated when
+	 * delalloc has been disabled by ext4_nonda_switch()), reduce the
+	 * reserved cluster count by the number of allocated clusters that
+	 * have previously been delayed allocated. Quota has been claimed
+	 * by ext4_mb_new_blocks(), so release the quota reservations made
+	 * for any previously delayed allocated clusters.
+	 */
+	ext4_da_update_reserve_space(inode, rinfo.ndelonly_clu + pending,
+				     !delayed && rinfo.ndelonly_blk);
 	if (err1 || err2 || err3 < 0)
 		goto retry;
 
@@ -2146,94 +2171,6 @@ void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
 	return;
 }
 
-/*
- * __es_delayed_clu - count number of clusters containing blocks that
- *                    are delayed only
- *
- * @inode - file containing block range
- * @start - logical block defining start of range
- * @end - logical block defining end of range
- *
- * Returns the number of clusters containing only delayed (not delayed
- * and unwritten) blocks in the range specified by @start and @end.  Any
- * cluster or part of a cluster within the range and containing a delayed
- * and not unwritten block within the range is counted as a whole cluster.
- */
-static unsigned int __es_delayed_clu(struct inode *inode, ext4_lblk_t start,
-				     ext4_lblk_t end)
-{
-	struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree;
-	struct extent_status *es;
-	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
-	struct rb_node *node;
-	ext4_lblk_t first_lclu, last_lclu;
-	unsigned long long last_counted_lclu;
-	unsigned int n = 0;
-
-	/* guaranteed to be unequal to any ext4_lblk_t value */
-	last_counted_lclu = ~0ULL;
-
-	es = __es_tree_search(&tree->root, start);
-
-	while (es && (es->es_lblk <= end)) {
-		if (ext4_es_is_delonly(es)) {
-			if (es->es_lblk <= start)
-				first_lclu = EXT4_B2C(sbi, start);
-			else
-				first_lclu = EXT4_B2C(sbi, es->es_lblk);
-
-			if (ext4_es_end(es) >= end)
-				last_lclu = EXT4_B2C(sbi, end);
-			else
-				last_lclu = EXT4_B2C(sbi, ext4_es_end(es));
-
-			if (first_lclu == last_counted_lclu)
-				n += last_lclu - first_lclu;
-			else
-				n += last_lclu - first_lclu + 1;
-			last_counted_lclu = last_lclu;
-		}
-		node = rb_next(&es->rb_node);
-		if (!node)
-			break;
-		es = rb_entry(node, struct extent_status, rb_node);
-	}
-
-	return n;
-}
-
-/*
- * ext4_es_delayed_clu - count number of clusters containing blocks that
- *                       are both delayed and unwritten
- *
- * @inode - file containing block range
- * @lblk - logical block defining start of range
- * @len - number of blocks in range
- *
- * Locking for external use of __es_delayed_clu().
- */
-unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk,
-				 ext4_lblk_t len)
-{
-	struct ext4_inode_info *ei = EXT4_I(inode);
-	ext4_lblk_t end;
-	unsigned int n;
-
-	if (len == 0)
-		return 0;
-
-	end = lblk + len - 1;
-	WARN_ON(end < lblk);
-
-	read_lock(&ei->i_es_lock);
-
-	n = __es_delayed_clu(inode, lblk, end);
-
-	read_unlock(&ei->i_es_lock);
-
-	return n;
-}
-
 /*
  * __revise_pending - makes, cancels, or leaves unchanged pending cluster
  *                    reservations for a specified block range depending
diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h
index d9847a4a25db..7344667eb2cd 100644
--- a/fs/ext4/extents_status.h
+++ b/fs/ext4/extents_status.h
@@ -251,8 +251,6 @@ extern void ext4_remove_pending(struct inode *inode, ext4_lblk_t lblk);
 extern bool ext4_is_pending(struct inode *inode, ext4_lblk_t lblk);
 extern void ext4_es_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk,
 					 bool allocated);
-extern unsigned int ext4_es_delayed_clu(struct inode *inode, ext4_lblk_t lblk,
-					ext4_lblk_t len);
 extern void ext4_clear_inode_es(struct inode *inode);
 
 #endif /* _EXT4_EXTENTS_STATUS_H */
diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index a9f3716119d3..448401e02c55 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -652,13 +652,6 @@ int ext4_ind_map_blocks(handle_t *handle, struct inode *inode,
 	ext4_update_inode_fsync_trans(handle, inode, 1);
 	count = ar.len;
 
-	/*
-	 * Update reserved blocks/metadata blocks after successful block
-	 * allocation which had been deferred till now.
-	 */
-	if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)
-		ext4_da_update_reserve_space(inode, count, 1);
-
 got_it:
 	map->m_flags |= EXT4_MAP_MAPPED;
 	map->m_pblk = le32_to_cpu(chain[depth-1].key);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 82115d6656d3..546a3b09fd0a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -330,11 +330,14 @@ qsize_t *ext4_get_reserved_space(struct inode *inode)
  * ext4_discard_preallocations() from here.
  */
 void ext4_da_update_reserve_space(struct inode *inode,
-					int used, int quota_claim)
+				  int used, int quota_claim)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	struct ext4_inode_info *ei = EXT4_I(inode);
 
+	if (!used)
+		return;
+
 	spin_lock(&ei->i_block_reservation_lock);
 	trace_ext4_da_update_reserve_space(inode, used, quota_claim);
 	if (unlikely(used > ei->i_reserved_data_blocks)) {
-- 
2.39.2

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ