[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0709291306320.4569@p34.internal.lan>
Date: Sat, 29 Sep 2007 13:08:45 -0400 (EDT)
From: Justin Piszcz <jpiszcz@...idpixels.com>
To: linux-kernel@...r.kernel.org
cc: linux-raid@...r.kernel.org, xfs@....sgi.com
Subject: Bonnie++ with 1024k stripe SW/RAID5 causes kernel to goto D-state
Kernel: 2.6.23-rc8 (older kernels do this as well)
When running the following command:
/usr/bin/time /usr/sbin/bonnie++ -d /x/test -s 16384 -m p34 -n 16:100000:16:64
It hangs unless I increase various parameters md/raid such as the
stripe_cache_size etc..
# ps auxww | grep D
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 276 0.0 0.0 0 0 ? D 12:14 0:00 [pdflush]
root 277 0.0 0.0 0 0 ? D 12:14 0:00 [pdflush]
root 1639 0.0 0.0 0 0 ? D< 12:14 0:00 [xfsbufd]
root 1767 0.0 0.0 8100 420 ? Ds 12:14 0:00
root 2895 0.0 0.0 5916 632 ? Ds 12:15 0:00 /sbin/syslogd -r
See the bottom for more details.
Is this normal? Does md only work without tuning up to a certain stripe
size? I use a RAID 5 with 1024k stripe which works fine with many
optimizations, but if I just boot the system and run bonnie++ on it
without applying the optimizations, it will hang in d-state. When I run
the optimizations, then it exits out of D-state, pretty weird?
(again, without this, bonnie++ will hang in d-state.. until this is run)
Optimization script:
#!/bin/bash
# source profile
. /etc/profile
# Tell user what's going on.
echo "Optimizing RAID Arrays..."
# Define DISKS.
cd /sys/block
DISKS=$(/bin/ls -1d sd[a-z])
# This step must come first.
# See: http://www.3ware.com/KB/article.aspx?id=11050
echo "Setting max_sectors_kb to 128 KiB"
for i in $DISKS
do
echo "Setting /dev/$i to 128 KiB..."
echo 128 > /sys/block/"$i"/queue/max_sectors_kb
done
# This step comes next.
echo "Setting nr_requests to 512 KiB"
for i in $DISKS
do
echo "Setting /dev/$i to 512K KiB"
echo 512 > /sys/block/"$i"/queue/nr_requests
done
# Set read-ahead.
echo "Setting read-ahead to 64 MiB for /dev/md3"
blockdev --setra 65536 /dev/md3
# Set stripe-cache_size for RAID5.
echo "Setting stripe_cache_size to 16 MiB for /dev/md3"
echo 16384 > /sys/block/md3/md/stripe_cache_size
# Set minimum and maximum raid rebuild speed to 30MB/s.
echo "Setting minimum and maximum resync speed to 30 MiB/s..."
echo 30000 > /sys/block/md0/md/sync_speed_min
echo 30000 > /sys/block/md0/md/sync_speed_max
echo 30000 > /sys/block/md1/md/sync_speed_min
echo 30000 > /sys/block/md1/md/sync_speed_max
echo 30000 > /sys/block/md2/md/sync_speed_min
echo 30000 > /sys/block/md2/md/sync_speed_max
echo 30000 > /sys/block/md3/md/sync_speed_min
echo 30000 > /sys/block/md3/md/sync_speed_max
# Disable NCQ on all disks.
echo "Disabling NCQ on all disks..."
for i in $DISKS
do
echo "Disabling NCQ on $i"
echo 1 > /sys/block/"$i"/device/queue_depth
done
--
Once this runs, everything works fine again.
--
# mdadm -D /dev/md3
/dev/md3:
Version : 00.90.03
Creation Time : Wed Aug 22 10:38:53 2007
Raid Level : raid5
Array Size : 1318680576 (1257.59 GiB 1350.33 GB)
Used Dev Size : 146520064 (139.73 GiB 150.04 GB)
Raid Devices : 10
Total Devices : 10
Preferred Minor : 3
Persistence : Superblock is persistent
Update Time : Sat Sep 29 13:05:15 2007
State : active, resyncing
Active Devices : 10
Working Devices : 10
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 1024K
Rebuild Status : 8% complete
UUID : e37a12d1:1b0b989a:083fb634:68e9eb49 (local to host p34.internal.lan)
Events : 0.4211
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 49 1 active sync /dev/sdd1
2 8 65 2 active sync /dev/sde1
3 8 81 3 active sync /dev/sdf1
4 8 97 4 active sync /dev/sdg1
5 8 113 5 active sync /dev/sdh1
6 8 129 6 active sync /dev/sdi1
7 8 145 7 active sync /dev/sdj1
8 8 161 8 active sync /dev/sdk1
9 8 177 9 active sync /dev/sdl1
--
NOTE: This bug is reproducible every time:
Example:
$ /usr/bin/time /usr/sbin/bonnie++ -d /x/test -s 16384 -m p34 -n
16:100000:16:64
Writing with putc()...
It writes for 4-5 minutes and then...... SILENCE + D-STATE, I was too late
this time :(
$ ps auxww | grep D
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 276 1.2 0.0 0 0 ? D 12:50 0:03 [pdflush]
root 2901 0.0 0.0 5916 632 ? Ds 12:50 0:00 /sbin/syslogd -
r
user 4571 48.0 0.0 11644 1084 pts/1 D+ 12:51 1:55 /usr/sbin/bonn
ie++ -d /x/test -s 16384 -m p34 -n 16:100000:16:64
root 4612 1.0 0.0 0 0 ? D 12:52 0:01 [pdflush]
root 4624 5.0 0.0 40964 7436 ? D 12:55 0:00 /usr/bin/perl -
w /app/rrd-cputemp/bin/rrd_cputemp.pl
root 4684 0.0 0.0 31968 1416 ? D 12:55 0:00 /usr/bin/rateup
/var/www/monitor/mrtg/ eth0 1191084902 -Z u 265975 843609 125000000 c #00cc00 #
0000ff #006600 #ff00ff k 1000 i /var/www/monitor/mrtg/eth0-day.png -125000000 -1
25000000 400 100 1 1 1 300 0 4 1 %Y-%m-%d %H:%M 0 i /var/www/monitor/mrtg/eth0-w
eek.png -125000000 -125000000 400 100 1 1 1 1800 0 4 1 %Y-%m-%d %H:%M 0 i /var/w
ww/monitor/mrtg/eth0-month.png -125000000 -125000000 400 100 1 1 1 7200 0 4 1 %Y
-%m-%d %H:%M 0
root 4686 0.0 0.0 4420 932 ? D 12:55 0:00 /usr/sbin/hddte
mp -n /dev/sdf
user 4688 0.0 0.0 4232 800 pts/5 S+ 12:55 0:00 grep --color D
$
If you are not logged as root already, it is sometimes too late to su to root
and run the optimizations:
$ su -
Password:
<hang forever>
Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists