linux-ext4 - Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <4C5733BD.3040801@uni-konstanz.de>
Date:	Mon, 02 Aug 2010 23:08:13 +0200
From:	Kay Diederichs <kay.diederichs@...-konstanz.de>
To:	Eric Sandeen <sandeen@...hat.com>
CC:	Dave Chinner <david@...morbit.com>,
	linux <linux-kernel@...r.kernel.org>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>,
	Karsten Schaefer <karsten.schaefer@...-konstanz.de>
Subject: Re: ext4 performance regression 2.6.27-stable versus 2.6.32 and later

Am 02.08.2010 18:12, schrieb Eric Sandeen:
> On 08/02/2010 09:52 AM, Kay Diederichs wrote:
>> Dave,
>>
>> as you suggested, we reverted "ext4: Avoid group preallocation for
>> closed files" and this indeed fixes a big part of the problem: after
>> booting the NFS server we get
>>
>> NFS-Server: turn5 2.6.32.16p i686
>> NFS-Client: turn10 2.6.18-194.8.1.el5 x86_64
>>
>> exported directory on the nfs-server:
>> /dev/md5 /mnt/md5 ext4
>> rw,seclabel,noatime,barrier=1,stripe=512,data=writeback 0 0
>>
>>   48 seconds for preparations
>>   28 seconds to rsync 100 frames with 597M from nfs directory
>>   57 seconds to rsync 100 frames with 595M to nfs directory
>>   70 seconds to untar 24353 kernel files with 323M to nfs directory
>>   57 seconds to rsync 24353 kernel files with 323M from nfs directory
>> 133 seconds to run xds_par in nfs directory
>> 425 seconds to run the script
>
> Interesting, I had found this commit to be a problem for small files
> which are constantly created&  deleted; the commit had the effect of
> packing the newly created files in the first free space that could be
> found, rather than walking down the disk leaving potentially fragmented
> freespace behind (see seekwatcher graph attached).  Reverting the patch
> sped things up for this test, but left the filesystem freespace in bad
> shape.
>
> But you seem to see one of the largest effects in here:
>
> 261 seconds to rsync 100 frames with 595M to nfs directory
> vs
>   57 seconds to rsync 100 frames with 595M to nfs directory
>
> with the patch reverted making things go faster.  So you are doing 100
> 6MB writes to the server, correct?

correct.

 >
> Is the filesystem mkfs'd fresh
> before each test, or is it aged?

it is too big to "just create it freshly". It was actually created a 
week ago, and filled by a single ~ 10-hour rsync job run on the server 
such that the filesystem should be filled in the most linear way 
possible. Since then, the benchmarking has created and deleted lots of 
files.

> If not mkfs'd, is it at least
> completely empty prior to the test, or does data remain on it?  I'm just

it's not empty: df -h reports
Filesystem            Size  Used Avail Use% Mounted on
/dev/md5              3.7T  2.8T  712G  80% /mnt/md5

e2freefrag-1.41.12 reports:
Device: /dev/md5
Blocksize: 4096 bytes
Total blocks: 976761344
Free blocks: 235345984 (24.1%)

Min. free extent: 4 KB
Max. free extent: 99348 KB
Avg. free extent: 1628 KB

HISTOGRAM OF FREE EXTENT SIZES:
Extent Size Range :  Free extents   Free Blocks  Percent
     4K...    8K-  :          1858          1858    0.00%
     8K...   16K-  :          3415          8534    0.00%
    16K...   32K-  :          9952         54324    0.02%
    32K...   64K-  :         23884        288848    0.12%
    64K...  128K-  :         27901        658130    0.28%
   128K...  256K-  :         25761       1211519    0.51%
   256K...  512K-  :         35863       3376274    1.43%
   512K... 1024K-  :         48643       9416851    4.00%
     1M...    2M-  :        150311      60704033   25.79%
     2M...    4M-  :        244895     148283666   63.01%
     4M...    8M-  :          3970       5508499    2.34%
     8M...   16M-  :           187        551835    0.23%
    16M...   32M-  :           302       1765912    0.75%
    32M...   64M-  :           282       2727162    1.16%
    64M...  128M-  :            42        788539    0.34%


> wondering if fragmented freespace is contributing to this behavior as
> well.  If there is fragmented freespace, then with the patch I think the
> allocator is more likely to hunt around for small discontiguous chunks
> of free sapce, rather than going further out in the disk looking for a
> large area to allocate from.

the last step of the benchmark, "xds_par", reads 600MB and writes 50MB. 
It has 16 threads which might put some additional pressure on the 
freespace hunting. That step also is fast in 2.6.27.48 but slow in 2.6.32+ .

>
> It might be interesting to use seekwatcher on the server to visualize
> the allocation/IO patterns for the test running just this far?
>
> -Eric

will try to install seekwatcher.

thanks,

Kay


Download attachment "smime.p7s" of type "application/pkcs7-signature" (5236 bytes)