linux-ext4 - [PATCH 1/2] Ext4: bigalloc: fix overreservation on quota accounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1525702692-65658-2-git-send-email-bo.liu@linux.alibaba.com>
Date:   Mon,  7 May 2018 22:18:11 +0800
From:   Liu Bo <bo.liu@...ux.alibaba.com>
To:     <linux-ext4@...r.kernel.org>
Subject: [PATCH 1/2] Ext4: bigalloc: fix overreservation on quota accounting

This can be reproduced by the following scripts,

$ mkfs.ext4 -Obigalloc /dev/vdd
$ mount /dev/vdd /mnt/
$ xfs_io -f -c "pwrite 0 2" -c "pwrite 16K 2" -c "pwrite 32K 2" -c "fsync" /mnt/dir/foobar

As the bigalloc cluster size is 64K by default, the above writes should be
in one whole cluster(64K), but currently it reports 128K instead, there is
a 64K leak, and if quota is enabled, it also leads to incorrect quota
accounting.

During the writeback phrase, what was happening is,

a) mapping block [0,4k) which covers [0,2) allocates a cluster and
compensates both quota reserved blocks and @i_reserved_data_blocks by one
reserved cluster count,

b) mapping block [16k,20k) which covers [16k,16k+2) finds that there's
already one available cluster, then it will map from it directly instead
of updating reservation.

c) no one will decrement @i_reserved_data_blocks and quota.

In fact, get_implied_cluster_alloc() has figured out if the cluster has
already been allocated, there is no need to do the compensation stuff.

Signed-off-by: Liu Bo <bo.liu@...ux.alibaba.com>
---
 fs/ext4/extents.c | 68 +++++--------------------------------------------------
 1 file changed, 6 insertions(+), 62 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index c969275ce3ee..1b3b24a53e1f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4511,70 +4511,14 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode,
 	 * block allocation which had been deferred till now.
 	 */
 	if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) {
-		unsigned int reserved_clusters;
-		/*
-		 * Check how many clusters we had reserved this allocated range
-		 */
-		reserved_clusters = get_reserved_cluster_alloc(inode,
-						map->m_lblk, allocated);
 		if (!map_from_cluster) {
-			BUG_ON(allocated_clusters < reserved_clusters);
-			if (reserved_clusters < allocated_clusters) {
-				struct ext4_inode_info *ei = EXT4_I(inode);
-				int reservation = allocated_clusters -
-						  reserved_clusters;
-				/*
-				 * It seems we claimed few clusters outside of
-				 * the range of this allocation. We should give
-				 * it back to the reservation pool. This can
-				 * happen in the following case:
-				 *
-				 * * Suppose s_cluster_ratio is 4 (i.e., each
-				 *   cluster has 4 blocks. Thus, the clusters
-				 *   are [0-3],[4-7],[8-11]...
-				 * * First comes delayed allocation write for
-				 *   logical blocks 10 & 11. Since there were no
-				 *   previous delayed allocated blocks in the
-				 *   range [8-11], we would reserve 1 cluster
-				 *   for this write.
-				 * * Next comes write for logical blocks 3 to 8.
-				 *   In this case, we will reserve 2 clusters
-				 *   (for [0-3] and [4-7]; and not for [8-11] as
-				 *   that range has a delayed allocated blocks.
-				 *   Thus total reserved clusters now becomes 3.
-				 * * Now, during the delayed allocation writeout
-				 *   time, we will first write blocks [3-8] and
-				 *   allocate 3 clusters for writing these
-				 *   blocks. Also, we would claim all these
-				 *   three clusters above.
-				 * * Now when we come here to writeout the
-				 *   blocks [10-11], we would expect to claim
-				 *   the reservation of 1 cluster we had made
-				 *   (and we would claim it since there are no
-				 *   more delayed allocated blocks in the range
-				 *   [8-11]. But our reserved cluster count had
-				 *   already gone to 0.
-				 *
-				 *   Thus, at the step 4 above when we determine
-				 *   that there are still some unwritten delayed
-				 *   allocated blocks outside of our current
-				 *   block range, we should increment the
-				 *   reserved clusters count so that when the
-				 *   remaining blocks finally gets written, we
-				 *   could claim them.
-				 */
-				dquot_reserve_block(inode,
-						EXT4_C2B(sbi, reservation));
-				spin_lock(&ei->i_block_reservation_lock);
-				ei->i_reserved_data_blocks += reservation;
-				spin_unlock(&ei->i_block_reservation_lock);
-			}
 			/*
-			 * We will claim quota for all newly allocated blocks.
-			 * We're updating the reserved space *after* the
-			 * correction above so we do not accidentally free
-			 * all the metadata reservation because we might
-			 * actually need it later on.
+			 * We don't have to compensate reserved clusters for
+			 * sibling delayed allocated extents if they belong to
+			 * the same cluster since they'll be mapped from the
+			 * clusters directly, i.e. @map_from_cluser is true,
+			 * otherwise there would be reservation/quota accounting
+			 * leaking.
 			 */
 			ext4_da_update_reserve_space(inode, allocated_clusters,
 							1);
-- 
1.8.3.1