[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090402140051.GA3030@Krystal>
Date: Thu, 2 Apr 2009 10:00:51 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: Jesper Krogh <jesper@...gh.cc>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Theodore Tso <tytso@....edu>, Ingo Molnar <mingo@...e.hu>,
David Rees <drees76@...il.com>,
Alan Cox <alan@...rguk.ukuu.org.uk>
Subject: Re: Linux 2.6.29
>
> Linus Torvalds wrote:
> > This obviously starts the merge window for 2.6.30, although as usual, I'll
> > probably wait a day or two before I start actively merging. I do that in
> > order to hopefully result in people testing the final plain 2.6.29 a bit
> > more before all the crazy changes start up again.
>
> I know this has been discussed before:
>
> [129401.996244] INFO: task updatedb.mlocat:31092 blocked for more than
> 480 seconds.
> [129402.084667] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [129402.179331] updatedb.mloc D 0000000000000000 0 31092 31091
> [129402.179335] ffff8805ffa1d900 0000000000000082 ffff8803ff5688a8
> 0000000000001000
> [129402.179338] ffffffff806cc000 ffffffff806cc000 ffffffff806d3e80
> ffffffff806d3e80
> [129402.179341] ffffffff806cfe40 ffffffff806d3e80 ffff8801fb9f87e0
> 000000000000ffff
> [129402.179343] Call Trace:
> [129402.179353] [<ffffffff802d3ff0>] sync_buffer+0x0/0x50
> [129402.179358] [<ffffffff80493a50>] io_schedule+0x20/0x30
> [129402.179360] [<ffffffff802d402b>] sync_buffer+0x3b/0x50
> [129402.179362] [<ffffffff80493d2f>] __wait_on_bit+0x4f/0x80
> [129402.179364] [<ffffffff802d3ff0>] sync_buffer+0x0/0x50
> [129402.179366] [<ffffffff80493dda>] out_of_line_wait_on_bit+0x7a/0xa0
> [129402.179369] [<ffffffff80252730>] wake_bit_function+0x0/0x30
> [129402.179396] [<ffffffffa0264346>] ext3_find_entry+0xf6/0x610 [ext3]
> [129402.179399] [<ffffffff802d3453>] __find_get_block+0x83/0x170
> [129402.179403] [<ffffffff802c4a90>] ifind_fast+0x50/0xa0
> [129402.179405] [<ffffffff802c5874>] iget_locked+0x44/0x180
> [129402.179412] [<ffffffffa0266435>] ext3_lookup+0x55/0x100 [ext3]
> [129402.179415] [<ffffffff802c32a7>] d_alloc+0x127/0x1c0
> [129402.179417] [<ffffffff802ba2a7>] do_lookup+0x1b7/0x250
> [129402.179419] [<ffffffff802bc51d>] __link_path_walk+0x76d/0xd60
> [129402.179421] [<ffffffff802ba17f>] do_lookup+0x8f/0x250
> [129402.179424] [<ffffffff802c8b37>] mntput_no_expire+0x27/0x150
> [129402.179426] [<ffffffff802bcb64>] path_walk+0x54/0xb0
> [129402.179428] [<ffffffff802bfd10>] filldir+0x0/0xf0
> [129402.179430] [<ffffffff802bcc8a>] do_path_lookup+0x7a/0x150
> [129402.179432] [<ffffffff802bbb55>] getname+0xe5/0x1f0
> [129402.179434] [<ffffffff802bd8d4>] user_path_at+0x44/0x80
> [129402.179437] [<ffffffff802b53b5>] cp_new_stat+0xe5/0x100
> [129402.179440] [<ffffffff802b56d0>] vfs_lstat_fd+0x20/0x60
> [129402.179442] [<ffffffff802b5737>] sys_newlstat+0x27/0x50
> [129402.179445] [<ffffffff8020c35b>] system_call_fastpath+0x16/0x1b
> Consensus seems to be something with large memory machines, lots of
> dirty pages and a long writeout time due to ext3.
>
> At the moment this the largest "usabillity" issue in the serversetup I'm
> working with. Can there be done something to "autotune" it .. or perhaps
> even fix it? .. or is it just to shift to xfs or wait for ext4?
>
Hi Jesper,
What you are seeing looks awefully like the bug I have spent some time
to try to figure out in this bugzilla thread :
[Bug 12309] Large I/O operations result in slow performance and high
iowait times
http://bugzilla.kernel.org/show_bug.cgi?id=12309
I created a fio test case out of a lttng trace to reproduce the problem
and created a patch to try to account the pages used by the i/o elevator
in the vm page count used to calculate memory pressure. Basically, the
behavior I was seeing is a constant increase of memory usage when doing
a dd-like write to disk until the memory fills up, which is indeed
wrong. The patch I posted in that thread seems to cause other problems
though, so probably we should teach kjournald to do better.
Here is the patch attempt :
http://bugzilla.kernel.org/attachment.cgi?id=20172
Here is the fio test case :
http://bugzilla.kernel.org/attachment.cgi?id=19894
My findings were this (I hope other people with deeper knowledge of
block layer/vm interaction can correct me) :
- Upon heavy and long disk writes, the pages used to back the buffers
continuously increase as if there was no memory pressure at all.
Therefore, I suspect they are held in a nowhere land that's unaccounted
for at the vm layer (not part of memory pressure). That would seem to
be the I/O elevator.
Can you give a try at the dd and fio test cases pointed out in the
bugzilla entry ? You may also want to see if my patch helps to partially
solve your problem. Another hint is to try to use the cgroups to
restrict you heavy I/O processes to a limited amount of memory;
although it does not solve the core of the problem, it made it disappear
for me. And of course trying to get a LTTng trace to get your head
around the problem can be very efficient. It's available as a git tree
over 2.6.29, and includes VFS, block I/O layer and vm instrumentation,
which helps looking at their interaction. All information is at
http://www.lttng.org.
Hoping this helps,
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists