lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <19f34abd0901180412w39d70ccqd0c10698bc70e6e9@mail.gmail.com>
Date:	Sun, 18 Jan 2009 13:12:36 +0100
From:	"Vegard Nossum" <vegard.nossum@...il.com>
To:	"Eric Dumazet" <dada1@...mosbay.com>
Cc:	lkml <linux-kernel@...r.kernel.org>,
	"Linux Netdev List" <netdev@...r.kernel.org>
Subject: Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

On Fri, Jan 16, 2009 at 9:48 PM, Eric Dumazet <dada1@...mosbay.com> wrote:
>
> I tried your program on latest git tree and could not reproduce any problem.
>
> (changed to 9 threads since I have 8 cpus)

Hm. My machine has 2 CPUs. I just reproduced it on a more recent
kernel, this time from:

commit a6525042bfdfcab128bd91fad264de10fd24a55e
Date:   Tue Jan 13 14:53:16 2009 -0800

with lockdep enabled, and no bad messages. So it seems that it is not
a deadlock at least...

> Problem might be that your threads all fight on the same pipe, with
> a mutex protecting its inode.
>
>
> So mutex_lock() could possibly starve for more than 120 second ?

Much longer. Last time this happened, the zombies stayed for many
hours (until I rebooted the machine).

> Maybe you can reproduce the problem using standard read()/write() syscalls...

I can try...

Tasks:   7 total,   0 running,   6 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.4%us,  6.0%sy,  0.0%ni, 92.6%id,  1.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1016180k total,    54596k used,   961584k free,     4084k buffers
Swap:   104380k total,        0k used,   104380k free,    20412k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3808 500       20   0     0    0    0 Z    0  0.0   0:00.00 a.out <defunct>
 3809 500       20   0     0    0    0 D    0  0.0   0:00.00 a.out
 3810 500       20   0     0    0    0 D    0  0.0   0:00.00 a.out
 3813 500       20   0     0    0    0 D    0  0.0   0:00.00 a.out
 3815 500       20   0     0    0    0 D    0  0.0   0:00.00 a.out
 3817 500       20   0     0    0    0 D    0  0.0   0:00.00 a.out
 3821 500       20   0     0    0    0 D    0  0.0   0:00.00 a.out

root@...ntu:/home/debian# strace -p 3808
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted

root@...ntu:/home/debian# strace -p 3809
Process 3809 attached - interrupt to quit
<nothing>
^C^C^C^C^C^C^C^C
<doesn't quit>

root@...ntu:/home/debian# cat /proc/3808/syscall
0 0xbfa1f5b4 0xbfa1f5b4 0xc8bff4 0xbfa1f5b4 0x0 0xbfa1f5c8 0xbfa1f3f8 0xb8020424
root@...ntu:/home/debian# cat /proc/3809/syscall
313 0x9 0x0 0x7 0x0 0x3 0x0 0xb80012cc 0xb8020424
root@...ntu:/home/debian# cat /proc/3810/syscall
313 0x6 0x0 0x5 0x0 0x17f 0x0 0xb780037c 0xb8020424
root@...ntu:/home/debian# cat /proc/3813/syscall
313 0xa 0x0 0x7 0x0 0x3 0x0 0xb67fe2cc 0xb8020424
root@...ntu:/home/debian# cat /proc/3815/syscall
313 0x6 0x0 0x5 0x0 0x17f 0x0 0xb5ffd37c 0xb8020424
root@...ntu:/home/debian# cat /proc/3817/syscall
313 0x8 0x0 0x7 0x0 0x3 0x0 0xb4ffb2cc 0xb8020424
root@...ntu:/home/debian# cat /proc/3821/syscall
313 0x6 0x0 0x5 0x0 0x17f 0x0 0xb47fa37c 0xb8020424

Also managed to grab this this time:

 SysRq : Show Locks Held

 Showing all locks held in the system:
 1 lock held by getty/2130:
  #0:  (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
 1 lock held by getty/2131:
  #0:  (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
 1 lock held by getty/2134:
  #0:  (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
 1 lock held by getty/2138:
  #0:  (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
 1 lock held by getty/2142:
  #0:  (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
 1 lock held by getty/2143:
  #0:  (&tty->atomic_read_lock){--..}, at: [<c1218973>] n_tty_read+0x533/0x780
 1 lock held by a.out/3809:
  #0:  (&sb->s_type->i_mutex_key#4){--..}, at: [<c10d5515>]
splice_to_pipe+0x25/0x260
 2 locks held by a.out/3810:
  #0:  (&sb->s_type->i_mutex_key#11/2){--..}, at: [<c10d548c>]
splice_from_pipe+0x5c/0x90
  #1:  (&sb->s_type->i_mutex_key#4){--..}, at: [<c10be5ac>] pipe_wait+0x6c/0x80
 1 lock held by a.out/3813:
  #0:  (&sb->s_type->i_mutex_key#4){--..}, at: [<c10d5515>]
splice_to_pipe+0x25/0x260
 2 locks held by a.out/3815:
  #0:  (&sb->s_type->i_mutex_key#4/1){--..}, at: [<c10c97a2>]
inode_double_lock+0x32/0xb0
  #1:  (&sb->s_type->i_mutex_key#11/2){--..}, at: [<c10d548c>]
splice_from_pipe+0x5c/0x90
 1 lock held by a.out/3817:
  #0:  (&sb->s_type->i_mutex_key#4){--..}, at: [<c10d5515>]
splice_to_pipe+0x25/0x260
 1 lock held by a.out/3821:
  #0:  (&sb->s_type->i_mutex_key#4/1){--..}, at: [<c10c97a2>]
inode_double_lock+0x32/0xb0
 2 locks held by bash/3916:
  #0:  (sysrq_key_table_lock){....}, at: [<c12297f7>] __handle_sysrq+0x17/0x140
  #1:  (tasklist_lock){..--}, at: [<c1059354>] debug_show_all_locks+0x34/0x180

 =============================================

Anything else that can help debug this? Machine is still running, so...

Thanks :-)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ