linux-kernel - Re: [PATCH RFC] mm: implement write-behind policy for sequential file writes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <dcb23e5d-81b9-9a6c-b7ac-bbad2ef77fd8@yandex-team.ru>
Date:   Mon, 2 Oct 2017 23:58:45 +0300
From:   Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Jens Axboe <axboe@...nel.dk>, Michal Hocko <mhocko@...e.com>,
        Mel Gorman <mgorman@...e.de>,
        Johannes Weiner <hannes@...xchg.org>,
        Tejun Heo <tj@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH RFC] mm: implement write-behind policy for sequential file
 writes

On 02.10.2017 22:54, Linus Torvalds wrote:
> On Mon, Oct 2, 2017 at 2:54 AM, Konstantin Khlebnikov
> <khlebnikov@...dex-team.ru> wrote:
>>
>> This patch implements write-behind policy which tracks sequential writes
>> and starts background writeback when have enough dirty pages in a row.
> 
> This looks lovely to me.
> 
> I do wonder if you also looked at finishing the background
> write-behind at close() time, because it strikes me that once you
> start doing that async writeout, it would probably be good to make
> sure you try to do the whole file.

Smaller files or tails is lesser problem and forced writeback here
might add bigger overhead due to small requests or too random IO.
Also open+append+close pattern could generate too much IO.

> 
> I'm thinking of filesystems that do delayed allocation etc - I'd
> expect that you'd want the whole file to get allocated on disk
> together, rather than have the "first 256kB aligned chunks" allocated
> thanks to write-behind, and then the final part allocated much later
> (after other files may have triggered their own write-behind). Think
> loads like copying lots of pictures around, for example.

As far as I know ext4 preallocates space beyond file end for writing
patterns like append + fsync. Thus allocated extents should be bigger
than 256k. I haven't looked into this yet.

> 
> I don't have any particularly strong feelings about this, but I do
> suspect that once you have started that IO, you do want to finish it
> all up as the file write is done. No?

I'm aiming into continuous file operations like downloading huge file
or writing verbose log. Original motivation came from low-latency server
workloads which suffers from parallel bulk operations which generates
tons of dirty pages. Probably for general-purpose usage thresholds
should be increased significantly to cover only really bulky patterns.

> 
> It would also be really nice to see some numbers. Perhaps a comparison
> of "vmstat 1" or similar when writing a big file to some slow medium
> like a USB stick (which is something we've done very very badly at,
> and this should help smooth out)?

I'll try to find out some real cases with numbers.

For now I see that massive write + fdatasync (dd conf=fdatasync, fio)
always ends earlier because writeback now starts earlier too.
Without fdatasync it's obviously slower.

Cp to usb stick + umount should show same result, plus cp could be
interrupted at any point without contaminating cache with dirty pages.

Kernel compilation tooks almost the same time because most files are
smaller than 256k.

> 
>                  Linus
>