[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87h9y2t3qt.fsf@openvz.org>
Date: Fri, 14 Nov 2014 14:34:34 +0300
From: Dmitry Monakhov <dmonakhov@...nvz.org>
To: Theodore Ts'o <tytso@....edu>
Cc: Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH,RFC] ext4: add lazytime mount option
Theodore Ts'o <tytso@....edu> writes:
> On Wed, Nov 12, 2014 at 04:47:42PM +0300, Dmitry Monakhov wrote:
>> Also sync mtime updates is a great pain for AIO submitter
>> because AIO submission may be blocked for a seconds (up to 5 second in my case)
>> if inode is part of current committing transaction see: do_get_write_access
>
> 5 seconds?!? So you're seeing cases where the jbd2 layer is taking
> that long to close a commit? It might be worth looking at that so we
> can understand why that is happening, and to see if there's anything
> we might do to improve things on that front. Even if we can get rid
> of most of the mtime updates, there will be other cases where a commit
> that takes a long time to complete will cause all sorts of other very
> nasty latencies on the entire system.
Our chunk server workload is quite generic
submit_task: performs aio-dio requests in to multiple chunk files from
several threads, this task should not block for too long.
sync_task: performs fsync/fdatasync on demand for modified chunk files before
we can ACK write-op to user, this task may block
Here is chunk server simulation load:
#TEST_CASE assumes that target fs is mounted to /mnt
# Performs random aio-dio write bsz:64k to preallocated files (size:128M) threads:32
# and performs fdatasync each 32'th write operation
$ fio ./aio-dio.fio
# Measure AIO-DIO write submission latency
$ dd if=/dev/zero of=/mnt/f bs=1M count=1
$ ioping -A -C -D -WWW /mnt/f
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=1 time=410 us
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=2 time=430 us
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=3 time=370 us
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=4 time=400 us
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=5 time=1.9 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=6 time=4.2 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=7 time=3.8 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=8 time=3.7 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=9 time=4.1 s
4.0 KiB from /mnt/f (ext4 /dev/mapper/vzvg-scratch_dev): request=10 time=1.9 s
>
>> Yeah we also has ticket for that :)
>> https://jira.sw.ru/browse/PSBM-20411
>
> Is this supposed to be a URL to publically visible web page?
>
> Host jira.sw.ru not found: 3(NXDOMAIN)
Ohh, unfortunetly this host is not visiable from outside.
>
>> > + if (flags & S_VERSION)
>> > + inode_inc_iversion(inode);
> ....
>> Since we want update all in-memory data we also have to explicitly update inode->i_version
>> Which was previously updated implicitly here:
>> mark_inode_dirty_sync()
>> ->__mark_inode_dirty
>> ->ext4_dirty_inode
>> ->ext4_mark_inode_dirty
>> ->ext4_mark_iloc_dirty
>> ->inode_inc_iversion(inode);
>
> It's not necessary to add a anothre call to inode_inc_version() since
> we already incremented the i_version if S_VERSION is set, and
> S_VERSIOn gets set when it's necessary to handle incrementing
> i_Version.
>
> The inode_inc_iversion() in mark4_ext4_iloc_dirty() is probably not
> necessary, since we already should be incrementing i_version whenever
> ctime and mtime gets updated. The inode_inc_iversion() there is more
> of a "belt and suspenders" safety thing, on the theory that the extra
> bump in i_version won't hurt anything.
>
> Cheers,
>
> - Ted
Download attachment "signature.asc" of type "application/pgp-signature" (473 bytes)
Powered by blists - more mailing lists