netdev - Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <19f34abd0901180610k430a3e4bpe18af036357ca642@mail.gmail.com>
Date:	Sun, 18 Jan 2009 15:10:01 +0100
From:	"Vegard Nossum" <vegard.nossum@...il.com>
To:	"Eric Dumazet" <dada1@...mosbay.com>
Cc:	"Ingo Molnar" <mingo@...e.hu>, lkml <linux-kernel@...r.kernel.org>,
	"Linux Netdev List" <netdev@...r.kernel.org>
Subject: Re: 2.6.27.9: splice_to_pipe() hung (blocked for more than 120 seconds)

On Sun, Jan 18, 2009 at 2:44 PM, Vegard Nossum <vegard.nossum@...il.com> wrote:
> So in short: Is it possible that inode_double_lock() in
> splice_from_pipe() first locks the pipe mutex, THEN locks the
> file/socket mutex? In that case, there should be a lock imbalance,
> because pipe_wait() would unlock the pipe while the file/socket mutex
> is held.
>
> That would possibly explain the sporadicity of the lockup; it depends
> on the actual order of the double lock.
>
> Why doesn't lockdep report that? Hm. I guess it is because these are
> both inode mutexes and lockdep can't detect a locking imbalance within
> the same lock class?
>
> Anyway, that's just a theory. :-) Will try to confirm by simplifying
> the test-case.

Hm, I do believe this _is_ evidence in favour of the theory:

top - 09:03:57 up  2:16,  2 users,  load average: 129.27, 49.28, 21.57
Tasks: 161 total,   1 running,  95 sleeping,   1 stopped,  64 zombie

:-)

#define _GNU_SOURCE

#include <sys/socket.h>
#include <sys/types.h>

#include <fcntl.h>
#include <errno.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

static int sock_fd[2];
static int pipe_fd[2];

#define N 16384

static void *do_write(void *unused)
{
	unsigned int i;

	for (i = 0; i < N; ++i)
		write(pipe_fd[1], "x", 1);

	return NULL;
}

static void *do_read(void *unused)
{
	unsigned int i;
	char c;

	for (i = 0; i < N; ++i)
		read(sock_fd[0], &c, 1);

	return NULL;
}

static void *do_splice(void *unused)
{
	unsigned int i;

	for (i = 0; i < N; ++i)
		splice(pipe_fd[0], NULL, sock_fd[1], NULL, 1, 0);

	return NULL;
}

int main(int argc, char *argv[])
{
	pthread_t writer;
	pthread_t reader;
	pthread_t splicer[2];

	while (1) {
		if (socketpair(AF_UNIX, SOCK_STREAM, 0, sock_fd) == -1)
			exit(EXIT_FAILURE);

		if (pipe(pipe_fd) == -1)
			exit(EXIT_FAILURE);

		pthread_create(&writer, NULL, &do_write, NULL);
		pthread_create(&reader, NULL, &do_read, NULL);
		pthread_create(&splicer[0], NULL, &do_splice, NULL);
		pthread_create(&splicer[1], NULL, &do_splice, NULL);

		pthread_join(writer, NULL);
		pthread_join(reader, NULL);
		pthread_join(splicer[0], NULL);
		pthread_join(splicer[1], NULL);

		printf("failed to deadlock, retrying...\n");
	}

	return EXIT_SUCCESS;
}

$ gcc splice.c -lpthread
$ ./a.out &
$ ./a.out &
$ ./a.out &
(as many as you want; then wait for a bit -- ten seconds works for me)
$ killall -9 a.out
(not all will die -- those are now zombies)


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html