[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1397647830-24444-1-git-send-email-wenqing.lz@taobao.com>
Date: Wed, 16 Apr 2014 19:30:26 +0800
From: Zheng Liu <gnehzuil.liu@...il.com>
To: linux-ext4@...r.kernel.org
Cc: Zheng Liu <wenqing.lz@...bao.com>, "Theodore Ts'o" <tytso@....edu>,
Andreas Dilger <adilger.kernel@...ger.ca>,
Jan Kara <jack@...e.cz>
Subject: [RFC PATCH v2 0/4] ext4: extents status tree shrinker improvement
Hi all,
Here is the second version to improve the extent status tree shrinker.
In this version I do some cleanups, add some statistics, and implement
two apporaches that we discussed at Napa to improve the shrinker.
One is to improve the current lru algorithm, which add a new list to
track all reclaimable objects in order not to burn some cpu time to scan
delayed extent. Meanwhile it makes lru algorithm more efficient when
some applications open a huge number of files. Another apporach is
inspired by Jan Kara. It drops lru algorithm and uses a round-robin
algorithm to shrink all reclaimable extent caches. Every time the
shrinker scans the list and tries to shrink objects from the position
that it stopped at last time. Please see the commit log in the patch
to get the more details.
>From the result, the conclusion is that the round-robin algorithm wins.
Espeically if the applications open a large amount of files.
In this patch set, patch 1 is pretty stable and can be queued in this
cycle. Patch 2 adds some statistics in order that we can collect more
details about the status of the shrinker. But I am not sure whether or
not we should enable it by default. Maybe we need to define a switch
to turn on/off dynamically. Patch 3 and patch 4 improve the shrinker
as described above.
There are also some improvements for these apporaches, such as using
rcu when the shrinker traverses the list because now the shrinker does
not need to change the list during this process. Another improvement
is to make the shrinker numa-aware. But before that I believe this
patch set should be reviewed as soon as possible. Now the key problem
is to make a decision which apporach should be applied.
I use two test cases to compare these improvements. The test case A
simulates some applications that generate a very fragmented extents
status tree, and the test case B simulates some applications opens a
large number of files with a few extent caches. Every test cases are
run 3 times.
For getting a fragmented extents status tree, I hack the code and let
ext4_es_can_be_merged() always return 0 in order to disable to merge
the extents status tree. Meanwhile for increasing the memory pressure,
vm.dirty_background_ratio is set to 60, and vm.dirty_ratio is set to 80
in order to keep dirty pages in memory as many as possible.
Environement
============
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 2
Core(s) per socket: 4
CPU socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 44
Stepping: 2
CPU MHz: 2400.000
BogoMIPS: 4799.89
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0-3,8-11
NUMA node1 CPU(s): 4-7,12-15
$ cat /proc/meminfo
MemTotal: 24677988 kB
$ df -ah
/dev/sdb1 183G 15G 159G 9% /mnt/sdb1 (HDD)
The Test Case A
===============
Script
------
[global]
ioengine=psync
bs=4k
directory=/mnt/sdb1
group_reporting
fallocate=0
direct=0
filesize=100000g
size=600000g
runtime=300
create_on_open=1
create_serialize=0
create_fsync=0
norandommap
[io]
rw=write
numjobs=100
nrfiles=5
Max Scan Time
-------------
x vanilla
+ lru
* rr
N Min Max Median Avg Stddev
x 3 22230 24607 23532 23456.333 1190.3051
+ 3 203 364 301 289.33333 81.13158
Difference at 95.0% confidence
-23167 +/- 1912.16
-98.7665% +/- 8.15199%
(Student's t, pooled s = 843.626)
* 3 165 248 172 195 46.032597
Difference at 95.0% confidence
-23261.3 +/- 1909.16
-99.1687% +/- 8.1392%
(Student's t, pooled s = 842.302)
Avg. Scan Time
-------------
x vanilla
+ lru
* rr
N Min Max Median Avg Stddev
x 220 204 15997 3976 5268.6773 4121.2038
+ 220 105 169 126 132.65 14.904881
Difference at 95.0% confidence
-5136.03 +/- 544.593
-97.4823% +/- 10.3364%
(Student's t, pooled s = 2914.15)
* 224 55 144 82 97.834821 27.811093
Difference at 95.0% confidence
-5170.84 +/- 539.706
-98.1431% +/- 10.2437%
(Student's t, pooled s = 2900.98)
The Test Case B
===============
Script
------
[global]
ioengine=psync
bs=4k
directory=/mnt/sdb1
group_reporting
fallocate=0
direct=0
runtime=300
create_on_open=1
create_serialize=0
create_fsync=0
norandommap
[io]
rw=randwrite
numjobs=25
nrfiles=40000
[streamer]
rw=write
numjobs=1
filesize=1000g
size=1000g
nrfiles=1
Max Scan Time
-------------
x vanilla
+ lru
* rr
N Min Max Median Avg Stddev
x 3 390531 481463 393469 421821 51672.373
+ 3 106433 170801 130652 135962 32510.874
Difference at 95.0% confidence
-285859 +/- 97844.9
-67.7678% +/- 23.1958%
(Student's t, pooled s = 43168.2)
* 3 72569 156338 113704 114203.67 41886.735
Difference at 95.0% confidence
-307617 +/- 106609
-72.926% +/- 25.2734%
(Student's t, pooled s = 47034.7)
Avg. Scan Time
-------------
x vanilla
+ lru
* rr
N Min Max Median Avg Stddev
x 221 164 155601 19553 24630.968 22736.242
+ 207 44 49210 13633 16167.768 15087.729
Difference at 95.0% confidence
-8463.2 +/- 3681.22
-34.36% +/- 14.9455%
(Student's t, pooled s = 19417.6)
* 78 41 18043 166 808.85897 2605.2387
Difference at 95.0% confidence
-23822.1 +/- 5062.86
-96.7161% +/- 20.5548%
(Student's t, pooled s = 19613.2)
As always, feedback, comment and idea are welcome.
Regards,
- Zheng
Zheng Liu (4):
ext4: improve extents status tree trace point
ext4: track extent status tree shrinker delay statictics
ext4: improve extents status tree shrinker lru algorithm
ext4: use a round-robin algorithm to shrink extent cache
fs/ext4/ext4.h | 11 +-
fs/ext4/extents.c | 4 +-
fs/ext4/extents_status.c | 310 +++++++++++++++++++++++++++++--------------
fs/ext4/extents_status.h | 16 ++-
fs/ext4/inode.c | 4 +-
fs/ext4/ioctl.c | 4 +-
fs/ext4/super.c | 22 ++-
include/trace/events/ext4.h | 59 ++++++--
8 files changed, 296 insertions(+), 134 deletions(-)
--
1.7.9.7
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists