[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <19f34abd0901180610k430a3e4bpe18af036357ca642@mail.gmail.com>
Date: Sun, 18 Jan 2009 15:10:01 +0100
From: "Vegard Nossum" <vegard.nossum@...il.com>
To: "Eric Dumazet" <dada1@...mosbay.com>
Cc: "Ingo Molnar" <mingo@...e.hu>, lkml <linux-kernel@...r.kernel.org>,
"Linux Netdev List" <netdev@...r.kernel.org>
Subject: Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)
On Sun, Jan 18, 2009 at 2:44 PM, Vegard Nossum <vegard.nossum@...il.com> wrote:
> So in short: Is it possible that inode_double_lock() in
> splice_from_pipe() first locks the pipe mutex, THEN locks the
> file/socket mutex? In that case, there should be a lock imbalance,
> because pipe_wait() would unlock the pipe while the file/socket mutex
> is held.
>
> That would possibly explain the sporadicity of the lockup; it depends
> on the actual order of the double lock.
>
> Why doesn't lockdep report that? Hm. I guess it is because these are
> both inode mutexes and lockdep can't detect a locking imbalance within
> the same lock class?
>
> Anyway, that's just a theory. :-) Will try to confirm by simplifying
> the test-case.
Hm, I do believe this _is_ evidence in favour of the theory:
top - 09:03:57 up 2:16, 2 users, load average: 129.27, 49.28, 21.57
Tasks: 161 total, 1 running, 95 sleeping, 1 stopped, 64 zombie
:-)
#define _GNU_SOURCE
#include <sys/socket.h>
#include <sys/types.h>
#include <fcntl.h>
#include <errno.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
static int sock_fd[2];
static int pipe_fd[2];
#define N 16384
static void *do_write(void *unused)
{
unsigned int i;
for (i = 0; i < N; ++i)
write(pipe_fd[1], "x", 1);
return NULL;
}
static void *do_read(void *unused)
{
unsigned int i;
char c;
for (i = 0; i < N; ++i)
read(sock_fd[0], &c, 1);
return NULL;
}
static void *do_splice(void *unused)
{
unsigned int i;
for (i = 0; i < N; ++i)
splice(pipe_fd[0], NULL, sock_fd[1], NULL, 1, 0);
return NULL;
}
int main(int argc, char *argv[])
{
pthread_t writer;
pthread_t reader;
pthread_t splicer[2];
while (1) {
if (socketpair(AF_UNIX, SOCK_STREAM, 0, sock_fd) == -1)
exit(EXIT_FAILURE);
if (pipe(pipe_fd) == -1)
exit(EXIT_FAILURE);
pthread_create(&writer, NULL, &do_write, NULL);
pthread_create(&reader, NULL, &do_read, NULL);
pthread_create(&splicer[0], NULL, &do_splice, NULL);
pthread_create(&splicer[1], NULL, &do_splice, NULL);
pthread_join(writer, NULL);
pthread_join(reader, NULL);
pthread_join(splicer[0], NULL);
pthread_join(splicer[1], NULL);
printf("failed to deadlock, retrying...\n");
}
return EXIT_SUCCESS;
}
$ gcc splice.c -lpthread
$ ./a.out &
$ ./a.out &
$ ./a.out &
(as many as you want; then wait for a bit -- ten seconds works for me)
$ killall -9 a.out
(not all will die -- those are now zombies)
Vegard
--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists