lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 18 Dec 2020 17:58:15 +0100
From:   Stephen Kitt <steve@....org>
To:     linux-man@...r.kernel.org,
        Alejandro Colomar <alx.manpages@...il.com>,
        Michael Kerrisk <mtk.manpages@...il.com>
Cc:     Christian Brauner <christian.brauner@...ntu.com>,
        Giuseppe Scrivano <gscrivan@...hat.com>,
        linux-kernel@...r.kernel.org, Stephen Kitt <steve@....org>
Subject: [PATCH v3] close_range.2: new page documenting close_range(2)

This documents close_range(2) based on information in
278a5fbaed89dacd04e9d052f4594ffd0e0585de,
60997c3d45d9a67daf01c56d805ae4fec37e0bd8, and
582f1fb6b721facf04848d2ca57f34468da1813e.

Signed-off-by: Stephen Kitt <steve@....org>
---
V3: fix synopsis overflow
    copy notes from membarrier.2 re the lack of wrapper
    semantic newlines
    drop non-standard "USE CASES" section heading
    add code example

V2: unsigned int to match the kernel declarations
    groff and grammar tweaks
    CLOSE_RANGE_UNSHARE unshares *and* closes
    Explain that EMFILE and ENOMEM can occur with C_R_U
    "Conforming to" phrasing
    Detailed explanation of CLOSE_RANGE_UNSHARE
    Reading /proc isn't common

 man2/close_range.2 | 266 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 266 insertions(+)
 create mode 100644 man2/close_range.2

diff --git a/man2/close_range.2 b/man2/close_range.2
new file mode 100644
index 000000000..f8f2053ac
--- /dev/null
+++ b/man2/close_range.2
@@ -0,0 +1,266 @@
+.\" Copyright (c) 2020 Stephen Kitt <steve@....org>
+.\"
+.\" %%%LICENSE_START(VERBATIM)
+.\" Permission is granted to make and distribute verbatim copies of this
+.\" manual provided the copyright notice and this permission notice are
+.\" preserved on all copies.
+.\"
+.\" Permission is granted to copy and distribute modified versions of this
+.\" manual under the conditions for verbatim copying, provided that the
+.\" entire resulting derived work is distributed under the terms of a
+.\" permission notice identical to this one.
+.\"
+.\" Since the Linux kernel and libraries are constantly changing, this
+.\" manual page may be incorrect or out-of-date.  The author(s) assume no
+.\" responsibility for errors or omissions, or for damages resulting from
+.\" the use of the information contained herein.  The author(s) may not
+.\" have taken the same level of care in the production of this manual,
+.\" which is licensed free of charge, as they might when working
+.\" professionally.
+.\"
+.\" Formatted or processed versions of this manual, if unaccompanied by
+.\" the source, must acknowledge the copyright and authors of this work.
+.\" %%%LICENSE_END
+.\"
+.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
+.SH NAME
+close_range \- close all file descriptors in a given range
+.SH SYNOPSIS
+.nf
+.B #include <linux/close_range.h>
+.PP
+.BI "int close_range(unsigned int " first ", unsigned int " last ,
+.BI "                unsigned int " flags );
+.fi
+.PP
+.IR Note :
+There is no glibc wrapper for this system call; see NOTES.
+.SH DESCRIPTION
+The
+.BR close_range ()
+system call closes all open file descriptors from
+.I first
+to
+.I last
+(included).
+.PP
+Errors closing a given file descriptor are currently ignored.
+.PP
+.I flags
+can be 0 or set to one or both of the following:
+.TP
+.B CLOSE_RANGE_UNSHARE
+unshares the range of file descriptors from any other processes,
+before closing them,
+avoiding races with other threads sharing the file descriptor table.
+.TP
+.BR CLOSE_RANGE_CLOEXEC " (since Linux 5.10)"
+sets the close-on-exec bit instead of immediately closing the file
+descriptors.
+.SH RETURN VALUE
+On success,
+.BR close_range ()
+returns 0.
+On error, \-1 is returned and
+.I errno
+is set to indicate the cause of the error.
+.SH ERRORS
+.TP
+.B EINVAL
+.I flags
+is not valid, or
+.I first
+is greater than
+.IR last .
+.PP
+The following can occur with
+.B CLOSE_RANGE_UNSHARE
+(when constructing the new descriptor table):
+.TP
+.B EMFILE
+The per-process limit on the number of open file descriptors has been reached
+(see the description of
+.B RLIMIT_NOFILE
+in
+.BR getrlimit (2)).
+.TP
+.B ENOMEM
+Insufficient kernel memory was available.
+.SH VERSIONS
+.BR close_range ()
+first appeared in Linux 5.9.
+.SH CONFORMING TO
+.BR close_range ()
+is a nonstandard function that is also present on FreeBSD.
+.SH NOTES
+Glibc does not provide a wrapper for this system call; call it using
+.BR syscall (2).
+.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
+.SS Closing all open file descriptors
+To avoid blindly closing file descriptors in the range of possible
+file descriptors,
+this is sometimes implemented (on Linux) by listing open file
+descriptors in
+.I /proc/self/fd/
+and calling
+.BR close (2)
+on each one.
+.BR close_range ()
+can take care of this without requiring
+.I /proc
+and with a single system call,
+which provides significant performance benefits.
+.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
+.SS Closing file descriptors before exec
+File descriptors can be closed safely using
+.PP
+.in +4n
+.EX
+/* we don't want anything past stderr here */
+close_range(3, ~0U, CLOSE_RANGE_UNSHARE);
+execve(....);
+.EE
+.in
+.PP
+.B CLOSE_RANGE_UNSHARE
+is conceptually equivalent to
+.PP
+.in +4n
+.EX
+unshare(CLONE_FILES);
+close_range(first, last, 0);
+.EE
+.in
+.PP
+but can be more efficient:
+if the unshared range extends past the current maximum number of file
+descriptors allocated in the caller's file descriptor table
+(the common case when
+.I last
+is
+.BR ~0U ),
+the kernel will unshare a new file descriptor table for the caller up
+to
+.IR first .
+This avoids subsequent close calls entirely;
+the whole operation is complete once the table is unshared.
+.\" 582f1fb6b721facf04848d2ca57f34468da1813e
+.SS Closing files on \fBexec\fP
+This is particularly useful in cases where multiple
+.RB pre- exec
+setup steps risk conflicting with each other.
+For example, setting up a
+.BR seccomp (2)
+profile can conflict with a
+.B close_range
+call:
+if the file descriptors are closed before the seccomp profile is set
+up,
+the profile setup can't use them control their closure;
+if the file descriptors are closed afterwards,
+the seccomp profile can't block the
+.B close_range
+call or any fallbacks.
+Using
+.B CLOSE_RANGE_CLOEXEC
+avoids this:
+the descriptors can be marked before the seccomp profile is set up,
+and the profile can control access to
+.B close_range
+without affecting the calling process.
+.SH EXAMPLES
+The following program is designed to be execed by the second program
+below.
+It lists its open file descriptors:
+.PP
+.in +4n
+.EX
+/* listopen.c */
+
+#include <stdio.h>
+#include <sys/stat.h>
+
+int
+main(int argc, char *argv[])
+{
+    int i;
+    struct stat buf;
+
+    for (i = 0; i < 100; i++) {
+        if (!fstat(i, &buf))
+            printf("FD %d is open.\n", i);
+    }
+
+    exit(EXIT_SUCCESS);
+)
+.EE
+.in
+.PP
+This program executes the command given on its command-line after
+opening the files listed after the command,
+and then using
+.B close_range
+to close them:
+.PP
+.in +4n
+.EX
+/* close_range.c */
+
+#include <fcntl.h>
+#include <linux/close_range.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+int
+main(int argc, char *argv[])
+{
+    char *newargv[] = { NULL };
+    char *newenviron[] = { NULL };
+    int i;
+
+    if (argc < 3) {
+        fprintf(stderr, "Usage: %s <command-to-run> <files-to-open>\n", argv[0]);
+        exit(EXIT_FAILURE);
+    }
+
+    for (i = 2; i < argc; i++) {
+        if (open(argv[i], O_RDONLY) == -1) {
+            perror(argv[i]);
+            exit(EXIT_FAILURE);
+        }
+    }
+
+    if (syscall(__NR_close_range, 3, ~0U, CLOSE_RANGE_UNSHARE) == -1) {
+        perror("close_range");
+        exit(EXIT_FAILURE);
+    }
+
+    execve(argv[1], newargv, newenviron);
+    perror("execve");
+    exit(EXIT_FAILURE);
+}
+.EE
+.in
+.PP
+We can use the second program to exec the first as follows:
+.PP
+.in +4n
+.EX
+.RB "$" " make listopen close_range"
+.RB "$" " ./close_range ./listopen /dev/null /dev/zero"
+FD 0 is open.
+FD 1 is open.
+FD 2 is open.
+.EE
+.in
+.PP
+Removing the call to
+.B close_range
+will show different output, with the file descriptors for the named
+files still open.
+.SH SEE ALSO
+.BR close (2)

base-commit: b5dae3959625f5ff378e9edf9139057d1c06bb55
-- 
2.20.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ