lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <1328784394-12977-1-git-send-email-hao.bigrat@gmail.com>
Date:	Thu,  9 Feb 2012 18:46:34 +0800
From:	Robin Dong <hao.bigrat@...il.com>
To:	linux-ext4@...r.kernel.org
Cc:	Robin Dong <sanbai@...bao.com>
Subject: [PATCH] ext4: fix wrong counting of s_dirtyclusters_counter for bigalloc in race condition

From: Robin Dong <sanbai@...bao.com>

When I run the shell scripts below for about 10 minutes in a 16-core server (upstream kernel):



DEV=/dev/sdc
FILE=/test/hello


do_write()
{
	while [ 1 ]
	do
		dd if=/dev/zero of=$FILE bs=1k count=$1 conv=notrunc &> /dev/null
	done
}

do_truncate()
{
	while [ 1 ]
	do
		truncate -s $1 $FILE
	done
}

mke2fs -m 0 -C 1048576 -O ^has_journal,^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,bigalloc $DEV
mount -t ext4 $DEV /test/

do_write 1 &
do_write 3 &
do_write 5 &
do_write 7 &
do_truncate 0 &
do_truncate 0 &
do_truncate 0 &



The "Used" ratio of ext4 filesystem ( which reported from "df" command ) grow very fast until it reach 100%, but actually the max size of the file in /test/ is only 7k.

Imaging a file has only one page (0~4k) which is delayed and not writeback yet (the i_reserved_data_blocks is 1),
and here comes two processes, process0 truncate page0(bh0), process1 write page1(bh1), the race condition will be like:


             process0                                                   process1

  -->truncate
    -->ext4_da_invalidatepage
      -->ext4_da_page_release_reservation
        -->clear_buffer_delay(bh0)                          
                                                                        -->ext4_da_map_blocks
                                                                          -->ext4_ext_map_blocks
                                                                            -->map->m_flags |= EXT4_MAP_FROM_CLUSTER
                                                                              (because bh0 is not delay now)
                                                                          -->ext4_da_reserve_space
                                                                            (i_reserved_data_blocks is 2 now)

        (the bh1 is delay, so ext4_da_release_space
         will not be called)


after bh1 writeback, the i_reserved_data_blocks is 1 but there is no really dirty cluster in the fs.

The following write operations will call ext4_da_update_reserve_space, but the sbi->s_dirtyclusters_counter will not be decreased since the i_reserved_data_block will not be zero any more. As a result, the s_dirtyclusters_counter grows fast.

Signed-off-by: Robin Dong <sanbai@...bao.com>
---
 fs/ext4/inode.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index feaa82f..9b3ceac 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1209,10 +1209,10 @@ static void ext4_da_page_release_reservation(struct page *page,
 	do {
 		unsigned int next_off = curr_off + bh->b_size;
 
-		if ((offset <= curr_off) && (buffer_delay(bh))) {
+		if ((offset <= curr_off) && buffer_delay(bh) &&
+				!buffer_da_mapped(bh)) {
 			to_release++;
 			clear_buffer_delay(bh);
-			clear_buffer_da_mapped(bh);
 		}
 		curr_off = next_off;
 	} while ((bh = bh->b_this_page) != head);
@@ -2544,6 +2544,7 @@ static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
 	 * Drop reserved blocks
 	 */
 	BUG_ON(!PageLocked(page));
+	down_write(&EXT4_I(page->mapping->host)->i_data_sem);
 	if (!page_has_buffers(page))
 		goto out;
 
@@ -2552,6 +2553,7 @@ static void ext4_da_invalidatepage(struct page *page, unsigned long offset)
 out:
 	ext4_invalidatepage(page, offset);
 
+	up_write(&EXT4_I(page->mapping->host)->i_data_sem);
 	return;
 }
 
-- 
1.7.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ