linux-ext4 - Re: Append and fsync performance in ext4 DAX

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAPaz=ELggmC6fg6k5wsc+a-eMcYLccjFbNcxqku6dnfEedJQ0Q@mail.gmail.com>
Date:   Sun, 29 Apr 2018 15:02:48 -0500
From:   Vijay Chidambaram <vvijay03@...il.com>
To:     "Theodore Y. Ts'o" <tytso@....edu>
Cc:     Ext4 <linux-ext4@...r.kernel.org>,
        Rohan Kadekodi <kadekodirohan@...il.com>,
        aasheesh kolli <aasheesh.kolli@...il.com>
Subject: Re: Append and fsync performance in ext4 DAX

On Sat, Apr 28, 2018 at 8:20 PM, Theodore Y. Ts'o <tytso@....edu> wrote:
> On Sat, Apr 28, 2018 at 11:24:32AM -0500, Vijay Chidambaram wrote:
>>
>> While we expect workload 1 to take more time than workload 2 since it
>> is extending the file, 10x higher time seems suspicious. If we remove
>> the fsync in workload 1, the running time drops to 3s. If we remove
>> the fsync in workload 2, the running time is around the same (1.5s).
>
> Can you mount the file system; run workload #N, and then once it's
> done, capture the output of /dev/fs/jbd2/<dev>-8/info, which should
> look like this:
>
> % cat /proc/fs/jbd2/dm-1-8/info
> 498438 transactions (498366 requested), each up to 65536 blocks
> average:
>   0ms waiting for transaction
>   0ms request delay
>   470ms running transaction
>   0ms transaction was being locked
>   0ms flushing data (in ordered mode)
>   0ms logging transaction
>   2522us average transaction commit time
>   161 handles per transaction
>   14 blocks per transaction
>   15 logged blocks per transaction
>
> It would be interesting to see this for workload #1 and workload #2.
>
> I will note that if you were using fdatasync(2) instead of fsync(2)
> for workload #2, there wouldn't be any journal transactions needed by
> the overwrites, and the speed up would be quite expecgted.
>
> It might be that in the overwrite case, especially if you are using
> 128 byte inodes such that the mtime timestamp has only one second
> granularity, that simply there isn't a need to do many journal
> transactions.

Thanks Ted! It was indeed the journal transactions that were making
the difference.

We found that the append case has 6 journal blocks written as part of
every fsync, and this was missing in the overwrite case.
Another factor was that the append workload's journal blocks are
written using temporal writes (written to cache and then flushed),
while the overwrite workload has only data written that are written in
non-temporal fashion. Temporal writes are slower than non-temporal
writes, so this also contributed to the difference in performance.