[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4970F2B6.1060508@cosmosbay.com>
Date: Fri, 16 Jan 2009 21:48:54 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Vegard Nossum <vegard.nossum@...il.com>
CC: lkml <linux-kernel@...r.kernel.org>,
Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)
CCed to netdev
Vegard Nossum a écrit :
> Hi,
>
> Seeing some recent splice() discussions, I decided to explore this
> system call. I have written a program which might look, well, not very
> useful, but the fact is that it creates an unkillable zombie process.
> Another funny side effect is that system load continually rises, even
> though the system seems to stay fully interactive and functional.
>
> After a while, I also get some messages like this:
>
> Jan 15 20:11:37 localhost kernel: INFO: task a.out:7149 blocked for
> more than 120 seconds.
> Jan 15 20:11:37 localhost kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jan 15 20:11:37 localhost kernel: a.out D ec6e2610 0 7149 1
> Jan 15 20:11:37 localhost kernel: ec5aad44 00000082 c042451f
> ec6e2610 00989680 c07da67c c07ddb80 c07ddb80
> Jan 15 20:11:37 localhost kernel: c07ddb80 ec6e4c20 ec6e4e7c
> c201db80 00000001 c201db80 470fed45 0000036b
> Jan 15 20:11:37 localhost kernel: ec5aad38 c0421027 ec6e263c
> ec6e4e7c ec6e3fa8 85c129f4 ec6e4c20 ec6e4c20
> Jan 15 20:11:37 localhost kernel: Call Trace:
> Jan 15 20:11:37 localhost kernel: [<c064420f>] __mutex_lock_common+0x8a/0xd9
> Jan 15 20:11:37 localhost kernel: [<c0644302>] __mutex_lock_slowpath+0x12/0x15
> Jan 15 20:11:37 localhost kernel: [<c0644181>] mutex_lock+0x29/0x2d
> Jan 15 20:11:37 localhost kernel: [<c04aa8f1>] splice_to_pipe+0x23/0x1f5
> Jan 15 20:11:37 localhost kernel: [<c04ab290>]
> __generic_file_splice_read+0x3ff/0x413
> Jan 15 20:11:37 localhost kernel: [<c04ab324>]
> generic_file_splice_read+0x80/0x9a
> Jan 15 20:11:37 localhost kernel: [<c04a9e95>] do_splice_to+0x4e/0x5f
> Jan 15 20:11:37 localhost kernel: [<c04aa010>] sys_splice+0x16a/0x1c8
> Jan 15 20:11:37 localhost kernel: [<c0403cca>] syscall_call+0x7/0xb
> Jan 15 20:11:37 localhost kernel: =======================
>
> (but this was from such a system with 6 zombies and ~80 load. See
> attachments for SysRq report with processes in blocked state, it has
> similar info but for just one zombie.)
>
> This happens with 2.6.27.9-73.fc9.i686 kernel. Maybe it was fixed
> recently? (In any case, I don't think it is a regression.)
>
> It seems to be not 100% reproducible. Sometimes it works, sometimes
> not. Start the program, then after a while hit Ctrl-C. If it doesn't
> exit, zombie count will rise and system state will be as described.
> Compile with -lpthread.
>
I tried your program on latest git tree and could not reproduce any problem.
(changed to 9 threads since I have 8 cpus)
Problem might be that your threads all fight on the same pipe, with
a mutex protecting its inode.
So mutex_lock() could possibly starve for more than 120 second ?
Maybe you can reproduce the problem using standard read()/write() syscalls...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists