linux-ext4 - Re: [PATCH 2/2] improve ext3 fsync batching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48AB144F.1070606@redhat.com>
Date:	Tue, 19 Aug 2008 14:43:27 -0400
From:	Ric Wheeler <rwheeler@...hat.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	rwheeler@...hat.com, Andreas Dilger <adilger@....com>,
	Josef Bacik <jbacik@...hat.com>, linux-kernel@...r.kernel.org,
	tglx@...utronix.de, linux-fsdevel@...r.kernel.org,
	chris.mason@...cle.com, linux-ext4@...r.kernel.org
Subject: Re: [PATCH 2/2] improve ext3 fsync batching

Andrew Morton wrote:
> On Tue, 19 Aug 2008 07:01:11 -0400 Ric Wheeler <rwheeler@...hat.com> wrote:
>
>   
>> It would be great to be able to use this batching technique for faster 
>> devices, but we currently sleep 3-4 times longer waiting to batch for an 
>> array than it takes to complete the transaction.
>>     
>
> Obviously, tuning that delay down to the minimum necessary is a good
> thing.  But doing it based on commit-time seems indirect at best.  What
> happens on a slower disk when commit times are in the tens of
> milliseconds?  When someone runs a concurrent `dd if=/dev/zero of=foo'
> when commit times go up to seconds?
>
> Perhaps a better scheme would be to tune it based on how many other
> processes are joining that transaction.  If it's "zero" then decrease
> the timeout.  But one would need to work out how to increase it, which
> perhaps could be done by detecting the case where process A runs an
> fsync when a commit is currently in progress, and that commit was
> caused by process B's fsync.
>
> But before doing all that I would recommend/ask that the following be
> investigated:
>
> - How effective is the present code?
>
>   - What happens when it is simply removed?
>
>   - Add instrumentation (a counter and a printk) to work out how
>     many other tasks are joining this task's transaction.
>
>     - If the answer is "zero" or "small", work out why.
>
>   - See if we can increase its effectiveness.
>
> Because it could be that the code broke.  There might be issues with
> higher-level locks which are preventing the batching.  For example, if
> all the files which the test app is syncing are in the same directory,
> perhaps all the tasks are piling up on that directory's i_mutex?
>   

One other way to think about this is as a fairly normal queuing problem:

    (1) arrival rate is the rate at which we see new tasks coming into 
the code
    (2) service time is basically the time spent committing the 
transaction to storage

and we have the assumption that some number of tasks can join a 
transaction more or less for "free."

What the existing code assumes is that all devices have an equal service 
time. That worked well as long as we only looked at devices that were 
roughly equal (10-20 ms latencies) or used a higher HZ for the kernel 
(1000HZ and you don't see this as much as with 100HZ).

The two key issues that Josef's code tried to address are that first 
assumption that all devices have a similar service time and the tie 
between how long we wait and the HZ. It would seem to be generically 
useful to be able to sleep for less than 1 jiffie, not just for file 
systems, but maybe also in some other contexts?

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html