lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <70E19A0A-2728-4ADA-B984-A36182C1F575@nocrew.org>
Date:	Sun, 12 Aug 2007 15:03:25 +0200
From:	Fredrik Noring <noring@...rew.org>
To:	linux-kernel@...r.kernel.org
Subject: Improving read/write/close system call reliability when used with pthreads

Hello!

The attached patch attempts to improve read/write/close system call  
reliability when used with pthreads and pipes. At least two issues  
are handled:

1. Read/write system calls hanging indefinitely when the fd is closed  
by another thread.

2. Read/write fd (reuse) alias problems in combination with - 
ERESTARTSYS.

For (1), a simple scenario has two threads: a reader and a watchdog.  
The reader thread is stuck in a blocking read of an unreliable pipe.  
The watchdog thread is designed to cancel the blocking read using  
close on the reader's fd.

Unfortunately, with the current kernel, the reader thread does not  
wake up by the close call issued by the watchdog. Instead, the reader  
thread remains stuck unless some unrelated signal happens to arrive,  
in which case it wakes up and terminates with -EBADF. This is also  
the current workaround: have the process signal itself after the  
close call in the watchdog thread.

For (2), the scenario can be as (1) above, adding an open system call  
between the close and signal events. Since the read fd has been  
closed it's now reused by open for a completely unrelated file, and  
when read finally wakes up by the signal it starts reading it.

In both of these cases, I believe it would be desirable to have read  
terminate immediately with -EBADF. Similar scenarios can be made for  
the write system call (and perhaps others as well).

The attached sample programs (watchdog.c and aliasing.c) demonstrate  
the effects.

Please regard the attached patch as proof-of-concept to trying to  
solve these problems. The FIXME certainly needs more thought, as do  
locking schemes, performance impact, related fd issues, and the  
approach taken in general. I'd be delighted if it would be possible  
to sort out. :-)

Many thanks,
Fredrik



Notes regarding the patch:

The patch adds a required_fds list to task_struct. System calls such  
as read and write register their fd respectively in this list, as the  
fd:s are required for successful completion of the calls. When close  
is called, it scans the required_fds lists and marks closed fd:s with  
-EBADF, as well as notifies the file with the new file->f_op- 
 >closing_fd file operation. This allows files to wake up waiting  
read/write calls. When read/write wakes up, it checks its  
required_fds list and cancels if its fd has been marked -EBADF.

fs/open.c                 |   63 +++++++++++++++++++++++++++++++++++++ 
+++++++++
fs/pipe.c                 |   28 ++++++++++++++++++++
fs/read_write.c           |    6 ++++
include/linux/fs.h        |    5 +++
include/linux/init_task.h |    1
include/linux/sched.h     |   13 ++++++++-
kernel/fork.c             |    1
7 files changed, 115 insertions(+), 2 deletions(-)



Notes regarding POSIX:

Whether or not close may cancel I/O operations appears to be  
implementation-defined if I interpret the following section  
correctly, unless anyone knows more specific details?

The Open Group Base Specifications Issue 6
IEEE Std 1003.1, 2004 Edition
Copyright 2001-2004 The IEEE and The Open Group, All Rights reserved.

When there is an outstanding cancelable asynchronous I/O operation  
against fildes when close() is called, that I/O operation may be  
canceled. An I/O operation that is not canceled completes as if the  
close() operation had not yet occurred. All operations that are not  
canceled shall complete as if the close() blocked until the  
operations completed. The close() operation itself need not block  
awaiting such I/O completion. Whether any I/O operation is canceled,  
and which I/O operation may be canceled upon close(), is  
implementation-defined.



watchdog.c:

#include <sys/types.h>
#include <sys/uio.h>
#include <pthread.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <errno.h>

int inout[2] = { -1, -1 };

void *watchdog(void *arg)
{
	fprintf(stderr, "Sleeping...\n");
	sleep(1);
	fprintf(stderr, "Closing...\n");
	close(inout[0]);
	fprintf(stderr, "Closed.\n");
	return NULL;
}

void wakeup(int sig)
{
	fprintf(stderr, "Alarmed.\n");
}

int main(int argc, char *argv[])
{
	pthread_t th;
	char buf[1];
	ssize_t r;
	
	pipe(inout);
	pthread_create(&th, NULL, watchdog, NULL);
	signal(SIGALRM, wakeup);
	alarm(5);
	
	fprintf(stderr, "Reading...\n");
	r = read(inout[0], buf, sizeof(buf));
	if (r == -1)
		perror("read");
	fprintf(stderr, "Read done.\n");
	
	pthread_join(th, NULL);
	fprintf(stderr, "Exit.\n");
	
	return 0;
}



aliasing.c:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/uio.h>
#include <pthread.h>
#include <unistd.h>
#include <signal.h>
#include <stdio.h>
#include <errno.h>

int inout[2] = { -1, -1 };

void *aliasing(void *arg)
{
	fprintf(stderr, "Sleeping...\n");
	sleep(1);
	fprintf(stderr, "Closing...\n");
	close(inout[0]);
	fprintf(stderr, "Closed.\n");
	fprintf(stderr, "Opening...\n");
	open(__FILE__, O_RDONLY);
	fprintf(stderr, "Opened.\n");
	return NULL;
}

void wakeup(int sig)
{
	fprintf(stderr, "Alarmed.\n");
}

int main(int argc, char *argv[])
{
	pthread_t th;
	char buf[1];
	ssize_t r;
	
	pipe(inout);
	pthread_create(&th, NULL, aliasing, NULL);
	signal(SIGALRM, wakeup);
	alarm(5);
	
	fprintf(stderr, "Reading...\n");
	r = read(inout[0], buf, 1);
	if (r == -1)
		perror("read");
	else
		fprintf(stderr, "Alias read!\n");
	fprintf(stderr, "Read done.\n");
	
	pthread_join(th, NULL);
	fprintf(stderr, "Exit.\n");
	
	return 0;
}



required-fds.patch (not sure Apple Mail can handle this properly  
though...):

--- linux-2.6.19-gentoo-r5/include/linux/init_task.h	2007-07-12  
22:03:14.000000000 +0200
+++ linux-2.6.19-required-fds/include/linux/init_task.h	2007-08-12  
03:51:34.000000000 +0200
@@ -126,6 +126,7 @@
	.thread		= INIT_THREAD,					\
	.fs		= &init_fs,					\
	.files		= &init_files,					\
+	.required_fds	= LIST_HEAD_INIT(tsk.required_fds),		\
	.signal		= &init_signals,				\
	.sighand	= &init_sighand,				\
	.nsproxy	= &init_nsproxy,				\
--- linux-2.6.19-gentoo-r5/include/linux/sched.h	2007-07-12  
22:03:14.000000000 +0200
+++ linux-2.6.19-required-fds/include/linux/sched.h	2007-08-12  
13:47:44.000000000 +0200
@@ -907,6 +907,8 @@
	struct fs_struct *fs;
/* open file information */
	struct files_struct *files;
+/* file descriptors required to complete current I/O operation  
successfully */
+	struct list_head required_fds;
/* namespaces */
	struct nsproxy *nsproxy;
/* signal handlers */
@@ -930,7 +932,7 @@
/* Thread group tracking */
     	u32 parent_exec_id;
     	u32 self_exec_id;
-/* Protection of (de-)allocation: mm, files, fs, tty, keyrings */
+/* Protection of (de-)allocation: mm, files, required_fds, fs, tty,  
keyrings */
	spinlock_t alloc_lock;
	/* Protection of the PI data structures: */
@@ -1025,6 +1027,13 @@
#endif
};
+#define REQUIRED_FD_INIT(fd) { .fd = fd }
+
+struct required_fd {
+	struct list_head list;
+	int fd;
+};
+
static inline pid_t process_group(struct task_struct *tsk)
{
	return tsk->signal->pgrp;
@@ -1429,7 +1438,7 @@
		(thread_group_leader(p) && !thread_group_empty(p))
/*
- * Protects ->fs, ->files, ->mm, ->group_info, ->comm, keyring
+ * Protects ->fs, ->files, ->mm, ->group_info, ->comm, - 
 >required_fds, keyring
   * subscriptions and synchronises with wait4().  Also used in  
procfs.  Also
   * pins the final release of task.io_context.  Also protects ->cpuset.
   *
--- linux-2.6.19-gentoo-r5/include/linux/fs.h	2007-07-12  
22:03:14.000000000 +0200
+++ linux-2.6.19-required-fds/include/linux/fs.h	2007-08-12  
13:27:02.000000000 +0200
@@ -1120,6 +1120,7 @@
	int (*mmap) (struct file *, struct vm_area_struct *);
	int (*open) (struct inode *, struct file *);
	int (*flush) (struct file *, fl_owner_t id);
+	void (*closing_fd) (struct inode *, struct file *, struct  
files_struct *files, unsigned int fd);
	int (*release) (struct inode *, struct file *);
	int (*fsync) (struct file *, struct dentry *, int datasync);
	int (*aio_fsync) (struct kiocb *, int datasync);
@@ -1497,6 +1498,10 @@
			int mode);
extern struct file *filp_open(const char *, int, int);
extern struct file * dentry_open(struct dentry *, struct vfsmount *,  
int);
+extern int required_fds_are_bad(struct task_struct *task);
+extern void add_required_fd(struct task_struct *task,
+			    struct required_fd *req_fd);
+extern void del_required_fds(struct task_struct *task);
extern int filp_close(struct file *, fl_owner_t id);
extern char * getname(const char __user *);
--- linux-2.6.19-gentoo-r5/kernel/fork.c	2007-07-12  
22:03:14.000000000 +0200
+++ linux-2.6.19-required-fds/kernel/fork.c	2007-08-12  
13:12:59.000000000 +0200
@@ -1156,6 +1156,7 @@
	retval = copy_thread(0, clone_flags, stack_start, stack_size, p, regs);
	if (retval)
		goto bad_fork_cleanup_namespaces;
+	INIT_LIST_HEAD(&p->required_fds);
	p->set_child_tid = (clone_flags & CLONE_CHILD_SETTID) ?  
child_tidptr : NULL;
	/*
--- linux-2.6.19-gentoo-r5/fs/read_write.c	2007-07-12  
22:03:14.000000000 +0200
+++ linux-2.6.19-required-fds/fs/read_write.c	2007-08-12  
13:19:59.000000000 +0200
@@ -355,10 +355,12 @@
asmlinkage ssize_t sys_read(unsigned int fd, char __user * buf,  
size_t count)
{
+	struct required_fd req_fd = REQUIRED_FD_INIT(fd);
	struct file *file;
	ssize_t ret = -EBADF;
	int fput_needed;
+	add_required_fd(current, &req_fd);
	file = fget_light(fd, &fput_needed);
	if (file) {
		loff_t pos = file_pos_read(file);
@@ -366,6 +368,7 @@
		file_pos_write(file, pos);
		fput_light(file, fput_needed);
	}
+	del_required_fds(current);
	return ret;
}
@@ -373,10 +376,12 @@
asmlinkage ssize_t sys_write(unsigned int fd, const char __user *  
buf, size_t count)
{
+	struct required_fd req_fd = REQUIRED_FD_INIT(fd);
	struct file *file;
	ssize_t ret = -EBADF;
	int fput_needed;
+	add_required_fd(current, &req_fd);
	file = fget_light(fd, &fput_needed);
	if (file) {
		loff_t pos = file_pos_read(file);
@@ -384,6 +389,7 @@
		file_pos_write(file, pos);
		fput_light(file, fput_needed);
	}
+	del_required_fds(current);
	return ret;
}
--- linux-2.6.19-gentoo-r5/fs/open.c	2007-07-12 22:03:14.000000000 +0200
+++ linux-2.6.19-required-fds/fs/open.c	2007-08-12 16:30:03.000000000  
+0200
@@ -2,6 +2,9 @@
   *  linux/fs/open.c
   *
   *  Copyright (C) 1991, 1992  Linus Torvalds
+ *
+ *  2007-08-12 Implemented required_fds functionality to close files
+ *             reliably with pthreads, by Fredrik Noring  
<noring@...rew.org>.
   */
#include <linux/string.h>
@@ -1015,6 +1018,53 @@
#endif
+void mark_required_fd_bad(struct task_struct *task, unsigned int fd)
+{
+	struct required_fd *req_fd;
+
+	task_lock(task);
+	list_for_each_entry(req_fd, &task->required_fds, list)
+		if (req_fd->fd == fd)
+			req_fd->fd = -EBADF;
+	task_unlock(task);
+}
+
+int required_fds_are_bad(struct task_struct *task)
+{
+	struct required_fd *req_fd;
+	int ret = 0;
+
+	task_lock(task);
+	list_for_each_entry(req_fd, &task->required_fds, list)
+		if (req_fd->fd == -EBADF) {
+			ret = 1;
+			break;
+		}
+	task_unlock(task);
+	
+	return ret;
+}
+
+EXPORT_SYMBOL(required_fds_are_bad);
+
+void add_required_fd(struct task_struct *task, struct required_fd  
*req_fd)
+{
+	task_lock(task);
+	list_add(&req_fd->list, &task->required_fds);
+	task_unlock(task);
+}
+
+EXPORT_SYMBOL(add_required_fd);
+
+void del_required_fds(struct task_struct *task)
+{
+	task_lock(task);
+	INIT_LIST_HEAD(&task->required_fds);
+	task_unlock(task);
+}
+
+EXPORT_SYMBOL(del_required_fds);
+
/*
   * "id" is the POSIX thread ID. We use the
   * files pointer for this..
@@ -1048,9 +1098,17 @@
{
	struct file * filp;
	struct files_struct *files = current->files;
+	struct task_struct *task;
	struct fdtable *fdt;
	int retval;
+	/* FIXME: Proof of concept but obviously inefficient. Mark
+	 * within f_op->closing_fd instead, where we know the tasks
+	 * that are really interested in learning this? */
+	for_each_process(task)
+		if(task->files == files)
+			mark_required_fd_bad(task, fd);
+	
	spin_lock(&files->file_lock);
	fdt = files_fdtable(files);
	if (fd >= fdt->max_fds)
@@ -1062,6 +1120,10 @@
	FD_CLR(fd, fdt->close_on_exec);
	__put_unused_fd(files, fd);
	spin_unlock(&files->file_lock);
+	if (filp->f_op && filp->f_op->closing_fd) {
+		struct inode *inode = filp->f_dentry->d_inode;
+		filp->f_op->closing_fd(inode, filp, files, fd);
+	}
	retval = filp_close(filp, files);
	/* can't restart close syscall because file table entry was cleared */
--- linux-2.6.19-gentoo-r5/fs/pipe.c	2007-07-12 22:03:14.000000000 +0200
+++ linux-2.6.19-required-fds/fs/pipe.c	2007-08-12 13:43:13.000000000  
+0200
@@ -316,6 +316,11 @@
			wake_up_interruptible_sync(&pipe->wait);
   			kill_fasync(&pipe->fasync_writers, SIGIO, POLL_OUT);
		}
+		if (required_fds_are_bad(current)) {
+			if (!ret)
+				ret = -EBADF;
+			break;
+		}
		pipe_wait(pipe);
	}
	mutex_unlock(&inode->i_mutex);
@@ -488,6 +493,11 @@
			kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN);
			do_wakeup = 0;
		}
+		if (required_fds_are_bad(current)) {
+			if (!ret)
+				ret = -EBADF;
+			break;
+		}
		pipe->waiting_writers++;
		pipe_wait(pipe);
		pipe->waiting_writers--;
@@ -576,6 +586,18 @@
	return mask;
}
+static void
+pipe_closing_fd(struct inode *inode, struct file *file,
+		struct files_struct *files, unsigned int fd)
+{
+	struct pipe_inode_info *pipe;
+
+	mutex_lock(&inode->i_mutex);
+	pipe = inode->i_pipe;
+	wake_up_interruptible(&pipe->wait);
+	mutex_unlock(&inode->i_mutex);
+}
+
static int
pipe_release(struct inode *inode, int decr, int decw)
{
@@ -727,6 +749,7 @@
	.poll		= pipe_poll,
	.ioctl		= pipe_ioctl,
	.open		= pipe_read_open,
+	.closing_fd	= pipe_closing_fd,
	.release	= pipe_read_release,
	.fasync		= pipe_read_fasync,
};
@@ -739,6 +762,7 @@
	.poll		= pipe_poll,
	.ioctl		= pipe_ioctl,
	.open		= pipe_write_open,
+	.closing_fd	= pipe_closing_fd,
	.release	= pipe_write_release,
	.fasync		= pipe_write_fasync,
};
@@ -752,6 +776,7 @@
	.poll		= pipe_poll,
	.ioctl		= pipe_ioctl,
	.open		= pipe_rdwr_open,
+	.closing_fd	= pipe_closing_fd,
	.release	= pipe_rdwr_release,
	.fasync		= pipe_rdwr_fasync,
};
@@ -764,6 +789,7 @@
	.poll		= pipe_poll,
	.ioctl		= pipe_ioctl,
	.open		= pipe_read_open,
+	.closing_fd	= pipe_closing_fd,
	.release	= pipe_read_release,
	.fasync		= pipe_read_fasync,
};
@@ -776,6 +802,7 @@
	.poll		= pipe_poll,
	.ioctl		= pipe_ioctl,
	.open		= pipe_write_open,
+	.closing_fd	= pipe_closing_fd,
	.release	= pipe_write_release,
	.fasync		= pipe_write_fasync,
};
@@ -789,6 +816,7 @@
	.poll		= pipe_poll,
	.ioctl		= pipe_ioctl,
	.open		= pipe_rdwr_open,
+	.closing_fd	= pipe_closing_fd,
	.release	= pipe_rdwr_release,
	.fasync		= pipe_rdwr_fasync,
};

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ