linux-ext4 - Re: [PATCH 2/2] improve ext3 fsync batching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48AAA7F7.5090501@redhat.com>
Date:	Tue, 19 Aug 2008 07:01:11 -0400
From:	Ric Wheeler <rwheeler@...hat.com>
To:	Andreas Dilger <adilger@....com>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Josef Bacik <jbacik@...hat.com>, linux-kernel@...r.kernel.org,
	tglx@...utronix.de, linux-fsdevel@...r.kernel.org,
	chris.mason@...cle.com, linux-ext4@...r.kernel.org
Subject: Re: [PATCH 2/2] improve ext3 fsync batching

Andreas Dilger wrote:
> On Aug 18, 2008  21:31 -0700, Andrew Morton wrote:
>   
>> On Wed, 6 Aug 2008 15:15:36 -0400 Josef Bacik <jbacik@...hat.com> wrote:
>>     
>>> Using the following fs_mark command to test the speeds
>>>
>>> ./fs_mark -d /mnt/ext3-test -s 4096 -n 2000 -D 64 -t 2
>>>
>>> I got the following results (with write cacheing turned off)
>>>
>>> type	threads		with patch	without patch
>>> sata	2		26.4		27.8
>>> sata	4		44.6		44.4
>>> sata	8		70.4		72.8
>>> sata	16		75.2		89.6
>>> sata	32		92.7		96.0
>>> ram	1		2399.1		2398.8
>>> ram	2		257.3		3603.0
>>> ram	4		395.6		4827.9
>>> ram	8		659.0		4721.1
>>> ram	16		1326.4		4373.3
>>> ram	32		1964.2		3816.3
>>>
>>> I used a ramdisk to emulate a "fast" disk since I don't happen to have a
>>> clariion sitting around.  I didn't test single thread in the sata case as it
>>> should be relatively the same between the two.  Thanks,
>>>       
>> This is all a bit mysterious.  That delay doesn't have much at all to
>> do with commit times.  The code is looping around giving other
>> userspace processes an opportunity to get scheduled and to run an fsync
>> and to join the current transaction rather than having to start a new
>> one.
>>
>> (that code was quite effective when I first added it, but in more
>> recent testing, which was some time ago, it doesn't appear to provide
>> any improvement.  This needs to be understood)
>>     
>
> I don't think it is mysterious at all.  With a HZ=100 system 1 jiffie
> is 10ms, which was comparable to the seek time of a disk, so sleeping
> for 1 jiffie to avoid doing 2 transactions was a win.  With a flash
> device (simulated by RAM here) seek time is 1ms so waiting 10ms
> isn't going to be useful if there are only 2 threads and both have
> already joined the transaction.
>   

The code was originally tuned to S-ATA & ATA disk response times which 
are closer to 12-15ms. Sleeping for 10ms (100HZ kernel) or 4ms (250HZ) 
did not overly penalize the low thread count case and worked well for 
higher thread counts (and ext3 special cases the single threaded writer 
so no sleep happens).

This is still a really, really good thing to do, but we need to sleep 
less when the device characteristics are radically different. For 
example, a fibre channel attached disk array drops that 12-15 ms down to 
1.5 ms (not to mention RAM disks!).
>   
>> Also, I'd expect that the average commit time is much longer that one
>> jiffy on most disks, and perhaps even on fast disks and maybe even on
>> ramdisk.  So perhaps what's happened here is that you've increased the
>> sleep period and more tasks are joining particular transactions.
>>
>> Or you've shortened the sleep time (which wasn't really doing anything
>> useful) and this causes tasks to spend less time asleep.
>>     
>
> I think both are true.  By making the sleep time dynamic it removes
> the "useless" sleep time, but can also increase the sleep time if
> there are many threads and the commit cost is better amortized over
> more operations.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>   

It would be great to be able to use this batching technique for faster 
devices, but we currently sleep 3-4 times longer waiting to batch for an 
array than it takes to complete the transaction.

Thanks!

Ric


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html