lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 20 Sep 2019 10:39:33 +0300
From:   Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To:     linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Cc:     Jens Axboe <axboe@...nel.dk>, Michal Hocko <mhocko@...e.com>,
        Dave Chinner <david@...morbit.com>,
        Mel Gorman <mgorman@...e.de>,
        Johannes Weiner <hannes@...xchg.org>,
        Tejun Heo <tj@...nel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v2] mm: implement write-behind policy for sequential file
 writes

Script for trivial demo in attachment

$ bash test_writebehind.sh
SIZE
3,2G	dummy
vm.dirty_write_behind = 0
COPY

real	0m3.629s
user	0m0.016s
sys	0m3.613s
Dirty:           3254552 kB
SYNC

real	0m31.953s
user	0m0.002s
sys	0m0.000s
vm.dirty_write_behind = 1
COPY

real	0m32.738s
user	0m0.008s
sys	0m4.047s
Dirty:              2900 kB
SYNC

real	0m0.427s
user	0m0.000s
sys	0m0.004s
vm.dirty_write_behind = 2
COPY

real	0m32.168s
user	0m0.000s
sys	0m4.066s
Dirty:              3088 kB
SYNC

real	0m0.421s
user	0m0.004s
sys	0m0.001s


With vm.dirty_write_behind 1 or 2 files are written even faster and
during copying amount of dirty memory always stays around at 16MiB.


On 20/09/2019 10.35, Konstantin Khlebnikov wrote:
> Traditional writeback tries to accumulate as much dirty data as possible.
> This is worth strategy for extremely short-living files and for batching
> writes for saving battery power. But for workloads where disk latency is
> important this policy generates periodic disk load spikes which increases
> latency for concurrent operations.
> 
> Also dirty pages in file cache cannot be reclaimed and reused immediately.
> This way massive I/O like file copying affects memory allocation latency.
> 
> Present writeback engine allows to tune only dirty data size or expiration
> time. Such tuning cannot eliminate spikes - this just lowers and multiplies
> them. Other option is switching into sync mode which flushes written data
> right after each write, obviously this have significant performance impact.
> Such tuning is system-wide and affects memory-mapped and randomly written
> files, flusher threads handle them much better.
> 
> This patch implements write-behind policy which tracks sequential writes
> and starts background writeback when file have enough dirty pages.
> 
> Global switch in sysctl vm.dirty_write_behind:
> =0: disabled, default
> =1: enabled for strictly sequential writes (append, copying)
> =2: enabled for all sequential writes
> 
> The only parameter is window size: maximum amount of dirty pages behind
> current position and maximum amount of pages in background writeback.
> 
> Setup is per-disk in sysfs in file /sys/block/$DISK/bdi/write_behind_kb.
> Default: 16MiB, '0' disables write-behind for this disk.
> 
> When amount of unwritten pages exceeds window size write-behind starts
> background writeback for max(excess, max_sectors_kb) and then waits for
> the same amount of background writeback initiated at previously.
> 
>   |<-wait-this->|           |<-send-this->|<---pending-write-behind--->|
>   |<--async-write-behind--->|<--------previous-data------>|<-new-data->|
>                current head-^    new head-^              file position-^
> 
> Remaining tail pages are flushed at closing file if async write-behind was
> started or this is new file and it is at least max_sectors_kb long.
> 
> Overall behavior depending on total data size:
> < max_sectors_kb - no writes
>> max_sectors_kb - write new files in background after close
>> write_behind_kb - streaming write, write tail at close
> 
> Special cases:
> 
> * files with POSIX_FADV_RANDOM, O_DIRECT, O_[D]SYNC are ignored
> 
> * writing cursor for O_APPEND is aligned to covers previous small appends
>    Append might happen via multiple files or via new file each time.
> 
> * mode vm.dirty_write_behind=1 ignores non-append writes
>    This reacts only to completely sequential writes like copying files,
>    writing logs with O_APPEND or rewriting files after O_TRUNC.
> 
> Note: ext4 feature "auto_da_alloc" also writes cache at closing file
> after truncating it to 0 and after renaming one file over other.
> 
> Changes since v1 (2017-10-02):
> * rework window management:
> * change default window 1MiB -> 16MiB
> * change default request 256KiB -> max_sectors_kb
> * drop always-async behavior for O_NONBLOCK
> * drop handling POSIX_FADV_NOREUSE (should be in separate patch)
> * ignore writes with O_DIRECT, O_SYNC, O_DSYNC
> * align head position for O_APPEND
> * add strictly sequential mode
> * write tail pages for new files
> * make void, keep errors at mapping
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
> Link: https://lore.kernel.org/patchwork/patch/836149/ (v1)
> ---

Download attachment "test_writebehind.sh" of type "application/x-shellscript" (428 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ