[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20130504013643.GC19978@dastard>
Date: Sat, 4 May 2013 11:36:43 +1000
From: Dave Chinner <david@...morbit.com>
To: linux-ext4@...r.kernel.org
Subject: [3.9] Parallel unlinks serialise completely
Hi folks,
Just an FYI. I was running a few fsmark workloads to compare
xfs/btrfs/ext4 performance (as i do every so often), and found that
ext4 is serialising unlinks on the orphan list mutex completely. The
script I've been running:
$ cat fsmark-50-test-ext4.sh
#!/bin/bash
sudo umount /mnt/scratch > /dev/null 2>&1
sudo mkfs.ext4 /dev/vdc
sudo mount /dev/vdc /mnt/scratch
sudo chmod 777 /mnt/scratch
cd /home/dave/src/fs_mark-3.3/
time ./fs_mark -D 10000 -S0 -n 100000 -s 0 -L 63 \
-d /mnt/scratch/0 -d /mnt/scratch/1 \
-d /mnt/scratch/2 -d /mnt/scratch/3 \
-d /mnt/scratch/4 -d /mnt/scratch/5 \
-d /mnt/scratch/6 -d /mnt/scratch/7 \
| tee >(stats --trim-outliers | tail -1 1>&2)
sync
sleep 30
sync
echo walking files
sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'
time (
for d in /mnt/scratch/[0-9]* ; do
for i in $d/*; do
(
echo $i
find $i -ctime 1 > /dev/null
) > /dev/null 2>&1
done &
done
wait
)
echo removing files
for f in /mnt/scratch/* ; do time rm -rf $f & done
wait
$
This is on a 100TB sparse VM image on a RAID0 of 4xSSDs, but that's
pretty much irrelevant to the problem being see. That is, I'm seeing
just a little over 1 CPU being expended during the unlink phase, and
only one of the 8 rm processes is running at a time.
`perf top -U -G` shows this as the leading 2 CPU consumers:
11.99% [kernel] [k] __mutex_unlock_slowpat
- __mutex_unlock_slowpat
- 99.79% mutex_unloc
+ 51.06% ext4_orphan_add
+ 46.86% ext4_orphan_del
1.04% do_unlinkat
sys_unlinkat
system_call_fastpath
unlinkat
0.95% vfs_unlink
do_unlinkat
sys_unlinkat
system_call_fastpath
unlinkat
- 7.14% [kernel] [k] __mutex_lock_slowpath
- __mutex_lock_slowpath
- 99.83% mutex_lock
+ 81.84% ext4_orphan_add
11.21% ext4_orphan_del
ext4_evict_inode
evict
iput
do_unlinkat
sys_unlinkat
system_call_fastpath
unlinkat
+ 3.47% vfs_unlink
+ 3.24% do_unlinkat
and the workload is running at roughly 40,000 context switches/s at
roughly 7000 iops.
Which looks rather like all unlinks are serialising the orphan list.
The overall results of the test are roughly:
create find unlink
ext4 24m21s 8m17s 37m51s
xfs 9m52s 6m53s 13m59s
The other notable thing about the unlink completion is this:
first rm last rm
ext4 30m26s 37m51s
xfs 13m52s 13m59s
There is significant unfairness in behaviour of the parallel
unlinks. The first 3 processes completed by 30m39s, but the last 5
processes all completed between 37m40s and 37m51s, 7 minutes later...
FWIW, there is also significant serialisation of the create
workload, but I didn't look at that at all.
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists