[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140305141343.GA26225@xanadu.blop.info>
Date: Wed, 5 Mar 2014 15:13:43 +0100
From: Lucas Nussbaum <lucas.nussbaum@...ia.fr>
To: linux-ext4@...r.kernel.org
Cc: Emmanuel Jeanvoine <emmanuel.jeanvoine@...ia.fr>
Subject: Extremely slow remounts with concurrent I/O
TL;DR: we experience long temporary hangs when doing multiple mount -o
remount at the same time as other I/O on an ext4 filesystem.
Hi,
When starting hundreds of LXC containers simultaneously on a system, the
boot of some containers was hanging. We tracked this down to an
initscript's use of mount -o remount, which was hanging in D state.
We reproduced the problem outside of LXC, with the script available at
[0]. That script initiates 1000 mount -o remount, and performs some
writes using a big cp to the same filesystem during the remounts.
All tests are performed on a Debian jessie system, with Linux 3.12.
Note that when running the script several times in a row, the same data is
copied over and over again, probably resulting in almost no I/O thanks to
caches. That's on purpose (see below).
When things work fine, the script above takes about 100s on our test
system (with quite a lot of variation). When things go wrong, it takes up
to 600s.
When things go wrong, we see the initial I/O caused by cp, which quickly
finishes.
After that, many (often close to 1000) mount processes are hung in D state
for a *long* time. During that time, we see still some I/O on the block
device (using dstat). At [1], you will find a btrace dump of what it
happening.
At some point, some mount processes start finishing, and then slowly all
mount process terminate.
Just to give an idea of timings:
from t0 to t0+25s: cp running
from t0+25s to t0+267s: all mount processes are hung
from t0+269s to t0+408s: mount processes terminate
After doing:
echo 100 > /proc/sys/kernel/hung_task_warnings
echo 30 > /proc/sys/kernel/hung_task_timeout_secs
We get:
[ 5879.392070] INFO: task mount:8953 blocked for more than 30 seconds.
[ 5879.429616] Tainted: G I 3.12-1-amd64 #1
[ 5879.461434] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5879.508354] mount D ffff88007f104380 0 8953 7954 0x00000000
[ 5879.550760] ffff88007f104040 0000000000000086 00000000000142c0 ffff880409281fd8
[ 5879.595415] 00000000000142c0 ffff880409281fd8 ffff880409281df8 ffff880409281d90
[ 5879.640054] ffff880409281df0 ffff88007f104040 0000000000000000 0000000000000020
[ 5879.684652] Call Trace:
[ 5879.699328] [<ffffffff8148d969>] ? schedule_timeout+0x1f9/0x2c0
[ 5879.735318] [<ffffffff81087a55>] ? check_preempt_curr+0x75/0x90
[ 5879.771305] [<ffffffff81087a7f>] ? ttwu_do_wakeup+0xf/0xd0
[ 5879.804683] [<ffffffff810898ba>] ? try_to_wake_up+0xca/0x260
[ 5879.839112] [<ffffffff8148fc50>] ? wait_for_completion+0xa0/0x110
[ 5879.876140] [<ffffffff81089a90>] ? wake_up_state+0x10/0x10
[ 5879.909517] [<ffffffff81198ebc>] ? sync_inodes_sb+0x9c/0x1a0
[ 5879.943942] [<ffffffff8119ee33>] ? sync_filesystem+0x53/0xa0
[ 5879.978385] [<ffffffff81176bba>] ? do_remount_sb+0x4a/0x1a0
[ 5880.012319] [<ffffffff81191d98>] ? do_mount+0x688/0xa70
[ 5880.044158] [<ffffffff8112c268>] ? memdup_user+0x38/0x70
[ 5880.076499] [<ffffffff811921fc>] ? SyS_mount+0x7c/0xc0
[ 5880.107791] [<ffffffff81498fb9>] ? system_call_fastpath+0x16/0x1b
[ 5880.697315] INFO: task mount:8957 blocked for more than 30 seconds.
[ 5880.734859] Tainted: G I 3.12-1-amd64 #1
[ 5880.766688] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5880.813608] mount D ffff8804062783c0 0 8957 7956 0x00000000
[ 5880.856012] ffff880406278080 0000000000000086 00000000000142c0 ffff88040263dfd8
[ 5880.900676] 00000000000142c0 ffff88040263dfd8 ffff880406278080 ffff88040263de68
[ 5880.945281] ffff88042d30f068 ffff88042d30f070 ffffffff00000000 ffff88042d30f078
[ 5880.989897] Call Trace:
[ 5881.004577] [<ffffffff81490cb5>] ? rwsem_down_write_failed+0x105/0x1d0
[ 5881.044214] [<ffffffff81271a73>] ? call_rwsem_down_write_failed+0x13/0x20
[ 5881.085403] [<ffffffff8148ed44>] ? down_write+0x24/0x26
[ 5881.117215] [<ffffffff811919ed>] ? do_mount+0x2dd/0xa70
[ 5881.149042] [<ffffffff8112c268>] ? memdup_user+0x38/0x70
[ 5881.181387] [<ffffffff811921fc>] ? SyS_mount+0x7c/0xc0
[ 5881.212684] [<ffffffff81498fb9>] ? system_call_fastpath+0x16/0x1b
Some other things we tried:
1) we tried to 'sync' after removing the files, and dropping the caches
(as shown in the commented lines in [0]). That makes the problem disappear
(or at least makes it less frequent). The overall script execution is
actually faster with the post-rm sync and dropping caches than without
them!
2) We tried switching to the noop scheduler (instead of cfq). The problem
could still be reproduced. A btrace dump with noop is available at [2].
3) We tried with ext3 instead of ext4. The problem could never be
reproduced.
4) We tried on different machines, and we could reproduce the problem.
However, on a machine with SSD drives, we were not able to reproduce the
problem.
Any ideas?
[0] http://blop.info/p/20140305-ext4-mount-remount.rb
[1] http://blop.info/p/20140305-ext4-mount-btrace.log
[2] http://blop.info/p/20140305-ext4-mount-btrace-noop.log.gz
--
Lucas Nussbaum and Emmanuel Jeanvoine
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists