lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120401125741.GA7484@p183.telecom.by>
Date:	Sun, 1 Apr 2012 15:57:42 +0300
From:	Alexey Dobriyan <adobriyan@...il.com>
To:	akpm@...ux-foundation.org, viro@...iv.linux.org.uk
Cc:	torvalds@...ux-foundation.org, drepper@...il.com,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: [PATCH] nextfd(2)

Currently there is no reliable way to close all opened file descriptors
(which daemons need and like to do):

* dumb close(fd) loop is slow, upper bound is unknown and
  can be arbitrary large,

* /proc/self/fd is unreliable:
  proc may be unconfigured or not mounted at expected place.
  Looking at /proc/self/fd requires opening directory
  which may not be available due to malicious rlimit drop or ENOMEM situations.
  Not opening directory is equivalent to dumb close(2) loop except slower.

BSD added closefrom(fd) which is OK for this exact purpose but suboptimal
on the bigger scale. closefrom(2) does only close(2) (obviously :-)
closefrom(2) siletly ignores errors from close(2) which in theory is not OK
for userspace.

So, don't add closefrom(2), add nextfd(2).

	int nextfd(int fd)

returns next opened file descriptor which is >= than fd or -1/ESRCH
if there aren't any descriptors >= than fd.

Thus closefrom(3) can be rewritten through it in userspace:

	void closefrom(int fd)
	{
		while (1) {
			fd = nextfd(fd);
			if (fd == -1 && errno == ESRCH)
				break;
			(void)close(fd);
			fd++;
		}
	}

Maybe it will grow other smart uses.

nextfd(2) doesn't change kernel state and thus can't fail
which is why it should go in. Other means may fail or
may not be available or require linear time with only guessed
upper boundaries (1024, getrlimit(RLIM_NOFILE), sysconf(_SC_OPEN_MAX).

Signed-off-by: Alexey Dobriyan <adobriyan@...il.com>
---

 arch/x86/syscalls/syscall_32.tbl |    1 +
 arch/x86/syscalls/syscall_64.tbl |    1 +
 fs/Makefile                      |    1 +
 fs/nextfd.c                      |   27 +++++++++++++++++++++++++++
 include/linux/syscalls.h         |    1 +
 5 files changed, 31 insertions(+)

--- a/arch/x86/syscalls/syscall_32.tbl
+++ b/arch/x86/syscalls/syscall_32.tbl
@@ -355,3 +355,4 @@
 346	i386	setns			sys_setns
 347	i386	process_vm_readv	sys_process_vm_readv		compat_sys_process_vm_readv
 348	i386	process_vm_writev	sys_process_vm_writev		compat_sys_process_vm_writev
+349	i386	nextfd			sys_nextfd
--- a/arch/x86/syscalls/syscall_64.tbl
+++ b/arch/x86/syscalls/syscall_64.tbl
@@ -318,6 +318,7 @@
 309	common	getcpu			sys_getcpu
 310	64	process_vm_readv	sys_process_vm_readv
 311	64	process_vm_writev	sys_process_vm_writev
+312	64	nextfd			sys_nextfd
 #
 # x32-specific system call numbers start at 512 to avoid cache impact
 # for native 64-bit operation.
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -12,6 +12,7 @@ obj-y :=	open.o read_write.o file_table.o super.o \
 		seq_file.o xattr.o libfs.o fs-writeback.o \
 		pnode.o drop_caches.o splice.o sync.o utimes.o \
 		stack.o fs_struct.o statfs.o
+obj-y += nextfd.o
 
 ifeq ($(CONFIG_BLOCK),y)
 obj-y +=	buffer.o bio.o block_dev.o direct-io.o mpage.o ioprio.o
--- /dev/null
+++ b/fs/nextfd.c
@@ -0,0 +1,27 @@
+#include <linux/errno.h>
+#include <linux/fdtable.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/syscalls.h>
+
+/* Return first opened file descriptor which is >= than the argument. */
+SYSCALL_DEFINE1(nextfd, unsigned int, fd)
+{
+	struct files_struct *files = current->files;
+	struct fdtable *fdt;
+
+	rcu_read_lock();
+	fdt = files_fdtable(files);
+	while (fd < fdt->max_fds) {
+		struct file *file;
+
+		file = rcu_dereference_check_fdtable(files, fdt->fd[fd]);
+		if (file) {
+			rcu_read_unlock();
+			return fd;
+		}
+		fd++;
+	}
+	rcu_read_unlock();
+	return -ESRCH;
+}
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -857,5 +857,6 @@ asmlinkage long sys_process_vm_writev(pid_t pid,
 				      const struct iovec __user *rvec,
 				      unsigned long riovcnt,
 				      unsigned long flags);
+asmlinkage long sys_nextfd(unsigned int fd);
 
 #endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ