linux-kernel - Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4970F2B6.1060508@cosmosbay.com>
Date:	Fri, 16 Jan 2009 21:48:54 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Vegard Nossum <vegard.nossum@...il.com>
CC:	lkml <linux-kernel@...r.kernel.org>,
	Linux Netdev List <netdev@...r.kernel.org>
Subject: Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

CCed to netdev

Vegard Nossum a écrit :
> Hi,
> 
> Seeing some recent splice() discussions, I decided to explore this
> system call. I have written a program which might look, well, not very
> useful, but the fact is that it creates an unkillable zombie process.
> Another funny side effect is that system load continually rises, even
> though the system seems to stay fully interactive and functional.
> 
> After a while, I also get some messages like this:
> 
> Jan 15 20:11:37 localhost kernel: INFO: task a.out:7149 blocked for
> more than 120 seconds.
> Jan 15 20:11:37 localhost kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Jan 15 20:11:37 localhost kernel: a.out         D ec6e2610     0  7149      1
> Jan 15 20:11:37 localhost kernel:       ec5aad44 00000082 c042451f
> ec6e2610 00989680 c07da67c c07ddb80 c07ddb80
> Jan 15 20:11:37 localhost kernel:       c07ddb80 ec6e4c20 ec6e4e7c
> c201db80 00000001 c201db80 470fed45 0000036b
> Jan 15 20:11:37 localhost kernel:       ec5aad38 c0421027 ec6e263c
> ec6e4e7c ec6e3fa8 85c129f4 ec6e4c20 ec6e4c20
> Jan 15 20:11:37 localhost kernel: Call Trace:
> Jan 15 20:11:37 localhost kernel: [<c064420f>] __mutex_lock_common+0x8a/0xd9
> Jan 15 20:11:37 localhost kernel: [<c0644302>] __mutex_lock_slowpath+0x12/0x15
> Jan 15 20:11:37 localhost kernel: [<c0644181>] mutex_lock+0x29/0x2d
> Jan 15 20:11:37 localhost kernel: [<c04aa8f1>] splice_to_pipe+0x23/0x1f5
> Jan 15 20:11:37 localhost kernel: [<c04ab290>]
> __generic_file_splice_read+0x3ff/0x413
> Jan 15 20:11:37 localhost kernel: [<c04ab324>]
> generic_file_splice_read+0x80/0x9a
> Jan 15 20:11:37 localhost kernel: [<c04a9e95>] do_splice_to+0x4e/0x5f
> Jan 15 20:11:37 localhost kernel: [<c04aa010>] sys_splice+0x16a/0x1c8
> Jan 15 20:11:37 localhost kernel: [<c0403cca>] syscall_call+0x7/0xb
> Jan 15 20:11:37 localhost kernel: =======================
> 
> (but this was from such a system with 6 zombies and ~80 load. See
> attachments for SysRq report with processes in blocked state, it has
> similar info but for just one zombie.)
> 
> This happens with 2.6.27.9-73.fc9.i686 kernel. Maybe it was fixed
> recently? (In any case, I don't think it is a regression.)
> 
> It seems to be not 100% reproducible. Sometimes it works, sometimes
> not. Start the program, then after a while hit Ctrl-C. If it doesn't
> exit, zombie count will rise and system state will be as described.
> Compile with -lpthread.
> 

I tried your program on latest git tree and could not reproduce any problem.

(changed to 9 threads since I have 8 cpus)

Problem might be that your threads all fight on the same pipe, with
a mutex protecting its inode.


So mutex_lock() could possibly starve for more than 120 second ?

Maybe you can reproduce the problem using standard read()/write() syscalls...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/