[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190607131025.31996-18-naohiro.aota@wdc.com>
Date: Fri, 7 Jun 2019 22:10:23 +0900
From: Naohiro Aota <naohiro.aota@....com>
To: linux-btrfs@...r.kernel.org, David Sterba <dsterba@...e.com>
Cc: Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
Qu Wenruo <wqu@...e.com>, Nikolay Borisov <nborisov@...e.com>,
linux-kernel@...r.kernel.org, Hannes Reinecke <hare@...e.com>,
linux-fsdevel@...r.kernel.org,
Damien Le Moal <damien.lemoal@....com>,
Matias Bjørling <mb@...htnvm.io>,
Johannes Thumshirn <jthumshirn@...e.de>,
Bart Van Assche <bvanassche@....org>,
Naohiro Aota <naohiro.aota@....com>
Subject: [PATCH 17/19] btrfs: shrink delayed allocation size in HMZONED mode
In a write heavy workload, the following scenario can occur:
1. mark page #0 to page #2 (and their corresponding extent region) as dirty
and candidate for delayed allocation
pages 0 1 2 3 4
dirty o o o - -
towrite - - - - -
delayed o o o - -
alloc
2. extent_write_cache_pages() mark dirty pages as TOWRITE
pages 0 1 2 3 4
dirty o o o - -
towrite o o o - -
delayed o o o - -
alloc
3. Meanwhile, another write dirties page #3 and page #4
pages 0 1 2 3 4
dirty o o o o o
towrite o o o - -
delayed o o o o o
alloc
4. find_lock_delalloc_range() decide to allocate a region to write page #0
to page #4
5. but, extent_write_cache_pages() only initiate write to TOWRITE tagged
pages (#0 to #2)
So the above process leaves page #3 and page #4 behind. Usually, the
periodic dirty flush kicks write IOs for page #3 and #4. However, if we try
to mount a subvolume at this timing, mount process takes s_umount write
lock to block the periodic flush to come in.
To deal with the problem, shrink the delayed allocation region to have only
expected to be written pages.
Signed-off-by: Naohiro Aota <naohiro.aota@....com>
---
fs/btrfs/extent_io.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c73c69e2bef4..ea582ff85c73 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3310,6 +3310,33 @@ static noinline_for_stack int writepage_delalloc(struct inode *inode,
delalloc_start = delalloc_end + 1;
continue;
}
+
+ if (btrfs_fs_incompat(btrfs_sb(inode->i_sb), HMZONED) &&
+ (wbc->sync_mode == WB_SYNC_ALL || wbc->tagged_writepages) &&
+ ((delalloc_start >> PAGE_SHIFT) <
+ (delalloc_end >> PAGE_SHIFT))) {
+ unsigned long i;
+ unsigned long end_index = delalloc_end >> PAGE_SHIFT;
+
+ for (i = delalloc_start >> PAGE_SHIFT;
+ i <= end_index; i++)
+ if (!xa_get_mark(&inode->i_mapping->i_pages, i,
+ PAGECACHE_TAG_TOWRITE))
+ break;
+
+ if (i <= end_index) {
+ u64 unlock_start = (u64)i << PAGE_SHIFT;
+
+ if (i == delalloc_start >> PAGE_SHIFT)
+ unlock_start += PAGE_SIZE;
+
+ unlock_extent(tree, unlock_start, delalloc_end);
+ __unlock_for_delalloc(inode, page, unlock_start,
+ delalloc_end);
+ delalloc_end = unlock_start - 1;
+ }
+ }
+
ret = btrfs_run_delalloc_range(inode, page, delalloc_start,
delalloc_end, &page_started, nr_written, wbc);
/* File system has been set read-only */
--
2.21.0
Powered by blists - more mailing lists