lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20180513175624.12887-6-enwlinux@gmail.com>
Date:   Sun, 13 May 2018 13:56:24 -0400
From:   Eric Whitney <enwlinux@...il.com>
To:     linux-ext4@...r.kernel.org
Cc:     tytso@....edu, Eric Whitney <enwlinux@...il.com>
Subject: [RFC PATCH 5/5] ext4: don't release delalloc clusters when invalidating page

With the preceding two patches, it's now possible to accurately
determine the number of delayed allocated clusters to be released
when removing a block range from the extent tree and extents status
tree in ext4_ext_truncate(), ext4_punch_hole(),
and ext4_collapse_range().  Since it's not possible to do this when
invalidating pages in ext4_da_page_release_reservation(), remove
those operations from it.  It's still appropriate to clear the
delayed bits for invalidated buffers there.

Removal of block ranges in ext4_ext_truncate() and other functions
appears redundant with removal at page invalidate time, and removal
of a block range as a unit should generally incur less CPU overhead
than page by page (block by block) removal.

Note: this change does result in a regression for generic/036 when
running at least the 4k, bigalloc, and bigalloc_1k test cases using
the xfstests-bld test appliance.  The test passes, but new kernel
error messages of the form "Page cache invalidation failure on
direct I/O.  Possible data corruption due to collision with buffered
I/O!" appear in dmesg.  It's likely this patch violates a direct I/O
implementation requirement, perhaps making DIO vulnerable to read
races with buffered I/O.  To be addressed.

Signed-off-by: Eric Whitney <enwlinux@...il.com>
---
 fs/ext4/inode.c | 51 +++++++++++++--------------------------------------
 1 file changed, 13 insertions(+), 38 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index a55b4db4a29c..6a903a850d95 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1628,62 +1628,37 @@ void ext4_da_release_space(struct inode *inode, int to_free)
 	dquot_release_reservation_block(inode, EXT4_C2B(sbi, to_free));
 }
 
+/*
+ * This code doesn't work and needs to be replaced.  Not deleting delayed
+ * blocks from the extents status tree here and deferring until blocks
+ * are deleted from the extent tree causes problems for DIO.  One avenue
+ * to explore is a partial reversion of the code here, altering the
+ * calls to delete blocks from the extents status tree to calls to
+ * invalidate those blocks, hopefully avoiding problems with the DIO code.
+ * They would then be deleted and accounted for when the extent tree is
+ * modified.
+ */
 static void ext4_da_page_release_reservation(struct page *page,
 					     unsigned int offset,
 					     unsigned int length)
 {
-	int to_release = 0, contiguous_blks = 0;
 	struct buffer_head *head, *bh;
 	unsigned int curr_off = 0;
-	struct inode *inode = page->mapping->host;
-	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	unsigned int stop = offset + length;
-	int num_clusters;
-	ext4_fsblk_t lblk;
+	unsigned int next_off;
 
 	BUG_ON(stop > PAGE_SIZE || stop < length);
 
 	head = page_buffers(page);
 	bh = head;
 	do {
-		unsigned int next_off = curr_off + bh->b_size;
-
+		next_off = curr_off + bh->b_size;
 		if (next_off > stop)
 			break;
-
-		if ((offset <= curr_off) && (buffer_delay(bh))) {
-			to_release++;
-			contiguous_blks++;
+		if ((offset <= curr_off) && (buffer_delay(bh)))
 			clear_buffer_delay(bh);
-		} else if (contiguous_blks) {
-			lblk = page->index <<
-			       (PAGE_SHIFT - inode->i_blkbits);
-			lblk += (curr_off >> inode->i_blkbits) -
-				contiguous_blks;
-			ext4_es_remove_extent(inode, lblk, contiguous_blks);
-			contiguous_blks = 0;
-		}
 		curr_off = next_off;
 	} while ((bh = bh->b_this_page) != head);
-
-	if (contiguous_blks) {
-		lblk = page->index << (PAGE_SHIFT - inode->i_blkbits);
-		lblk += (curr_off >> inode->i_blkbits) - contiguous_blks;
-		ext4_es_remove_extent(inode, lblk, contiguous_blks);
-	}
-
-	/* If we have released all the blocks belonging to a cluster, then we
-	 * need to release the reserved space for that cluster. */
-	num_clusters = EXT4_NUM_B2C(sbi, to_release);
-	while (num_clusters > 0) {
-		lblk = (page->index << (PAGE_SHIFT - inode->i_blkbits)) +
-			((num_clusters - 1) << sbi->s_cluster_bits);
-		if (sbi->s_cluster_ratio == 1 ||
-		    !ext4_find_delalloc_cluster(inode, lblk))
-			ext4_da_release_space(inode, 1);
-
-		num_clusters--;
-	}
 }
 
 /*
-- 
2.11.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ