[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <875f3b55-4fe1-e2c3-5bee-ca79e4668e72@yandex-team.ru>
Date: Fri, 20 Sep 2019 10:39:33 +0300
From: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To: linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Cc: Jens Axboe <axboe@...nel.dk>, Michal Hocko <mhocko@...e.com>,
Dave Chinner <david@...morbit.com>,
Mel Gorman <mgorman@...e.de>,
Johannes Weiner <hannes@...xchg.org>,
Tejun Heo <tj@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file
writes
Script for trivial demo in attachment
$ bash test_writebehind.sh
SIZE
3,2G dummy
vm.dirty_write_behind = 0
COPY
real 0m3.629s
user 0m0.016s
sys 0m3.613s
Dirty: 3254552 kB
SYNC
real 0m31.953s
user 0m0.002s
sys 0m0.000s
vm.dirty_write_behind = 1
COPY
real 0m32.738s
user 0m0.008s
sys 0m4.047s
Dirty: 2900 kB
SYNC
real 0m0.427s
user 0m0.000s
sys 0m0.004s
vm.dirty_write_behind = 2
COPY
real 0m32.168s
user 0m0.000s
sys 0m4.066s
Dirty: 3088 kB
SYNC
real 0m0.421s
user 0m0.004s
sys 0m0.001s
With vm.dirty_write_behind 1 or 2 files are written even faster and
during copying amount of dirty memory always stays around at 16MiB.
On 20/09/2019 10.35, Konstantin Khlebnikov wrote:
> Traditional writeback tries to accumulate as much dirty data as possible.
> This is worth strategy for extremely short-living files and for batching
> writes for saving battery power. But for workloads where disk latency is
> important this policy generates periodic disk load spikes which increases
> latency for concurrent operations.
>
> Also dirty pages in file cache cannot be reclaimed and reused immediately.
> This way massive I/O like file copying affects memory allocation latency.
>
> Present writeback engine allows to tune only dirty data size or expiration
> time. Such tuning cannot eliminate spikes - this just lowers and multiplies
> them. Other option is switching into sync mode which flushes written data
> right after each write, obviously this have significant performance impact.
> Such tuning is system-wide and affects memory-mapped and randomly written
> files, flusher threads handle them much better.
>
> This patch implements write-behind policy which tracks sequential writes
> and starts background writeback when file have enough dirty pages.
>
> Global switch in sysctl vm.dirty_write_behind:
> =0: disabled, default
> =1: enabled for strictly sequential writes (append, copying)
> =2: enabled for all sequential writes
>
> The only parameter is window size: maximum amount of dirty pages behind
> current position and maximum amount of pages in background writeback.
>
> Setup is per-disk in sysfs in file /sys/block/$DISK/bdi/write_behind_kb.
> Default: 16MiB, '0' disables write-behind for this disk.
>
> When amount of unwritten pages exceeds window size write-behind starts
> background writeback for max(excess, max_sectors_kb) and then waits for
> the same amount of background writeback initiated at previously.
>
> |<-wait-this->| |<-send-this->|<---pending-write-behind--->|
> |<--async-write-behind--->|<--------previous-data------>|<-new-data->|
> current head-^ new head-^ file position-^
>
> Remaining tail pages are flushed at closing file if async write-behind was
> started or this is new file and it is at least max_sectors_kb long.
>
> Overall behavior depending on total data size:
> < max_sectors_kb - no writes
>> max_sectors_kb - write new files in background after close
>> write_behind_kb - streaming write, write tail at close
>
> Special cases:
>
> * files with POSIX_FADV_RANDOM, O_DIRECT, O_[D]SYNC are ignored
>
> * writing cursor for O_APPEND is aligned to covers previous small appends
> Append might happen via multiple files or via new file each time.
>
> * mode vm.dirty_write_behind=1 ignores non-append writes
> This reacts only to completely sequential writes like copying files,
> writing logs with O_APPEND or rewriting files after O_TRUNC.
>
> Note: ext4 feature "auto_da_alloc" also writes cache at closing file
> after truncating it to 0 and after renaming one file over other.
>
> Changes since v1 (2017-10-02):
> * rework window management:
> * change default window 1MiB -> 16MiB
> * change default request 256KiB -> max_sectors_kb
> * drop always-async behavior for O_NONBLOCK
> * drop handling POSIX_FADV_NOREUSE (should be in separate patch)
> * ignore writes with O_DIRECT, O_SYNC, O_DSYNC
> * align head position for O_APPEND
> * add strictly sequential mode
> * write tail pages for new files
> * make void, keep errors at mapping
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
> Link: https://lore.kernel.org/patchwork/patch/836149/ (v1)
> ---
Download attachment "test_writebehind.sh" of type "application/x-shellscript" (428 bytes)
Powered by blists - more mailing lists