lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d3816808-92ee-0615-162d-fa5e83ef4493@gmail.com>
Date:   Mon, 21 Dec 2020 09:32:48 +0100
From:   "Alejandro Colomar (man-pages)" <alx.manpages@...il.com>
To:     Stephen Kitt <steve@....org>
Cc:     linux-man@...r.kernel.org,
        Michael Kerrisk <mtk.manpages@...il.com>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Giuseppe Scrivano <gscrivan@...hat.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] close_range.2: new page documenting close_range(2)



On 12/20/20 11:00 PM, Stephen Kitt wrote:
> Hi Alex,
> 
> On Sat, 19 Dec 2020 15:00:00 +0100, "Alejandro Colomar (man-pages)"
> <alx.manpages@...il.com> wrote:
>> Please see some comments below.
>> It's looking good ;)
> 
> Thanks for your review and patience!
> 
>> On 12/18/20 5:58 PM, Stephen Kitt wrote:
>>> This documents close_range(2) based on information in
>>> 278a5fbaed89dacd04e9d052f4594ffd0e0585de,
>>> 60997c3d45d9a67daf01c56d805ae4fec37e0bd8, and
>>> 582f1fb6b721facf04848d2ca57f34468da1813e.
>>>
>>> Signed-off-by: Stephen Kitt <steve@....org>
>>> ---
>>> V3: fix synopsis overflow
>>>     copy notes from membarrier.2 re the lack of wrapper
>>>     semantic newlines
>>>     drop non-standard "USE CASES" section heading
>>>     add code example
>>>
>>> V2: unsigned int to match the kernel declarations
>>>     groff and grammar tweaks
>>>     CLOSE_RANGE_UNSHARE unshares *and* closes
>>>     Explain that EMFILE and ENOMEM can occur with C_R_U
>>>     "Conforming to" phrasing
>>>     Detailed explanation of CLOSE_RANGE_UNSHARE
>>>     Reading /proc isn't common
>>>
>>>  man2/close_range.2 | 266 +++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 266 insertions(+)
>>>  create mode 100644 man2/close_range.2
>>>
>>> diff --git a/man2/close_range.2 b/man2/close_range.2
>>> new file mode 100644
>>> index 000000000..f8f2053ac
>>> --- /dev/null
>>> +++ b/man2/close_range.2
>>> @@ -0,0 +1,266 @@
>>> +.\" Copyright (c) 2020 Stephen Kitt <steve@....org>
>>> +.\"
>>> +.\" %%%LICENSE_START(VERBATIM)
>>> +.\" Permission is granted to make and distribute verbatim copies of this
>>> +.\" manual provided the copyright notice and this permission notice are
>>> +.\" preserved on all copies.
>>> +.\"
>>> +.\" Permission is granted to copy and distribute modified versions of
>>> this +.\" manual under the conditions for verbatim copying, provided that
>>> the +.\" entire resulting derived work is distributed under the terms of a
>>> +.\" permission notice identical to this one.
>>> +.\"
>>> +.\" Since the Linux kernel and libraries are constantly changing, this
>>> +.\" manual page may be incorrect or out-of-date.  The author(s) assume no
>>> +.\" responsibility for errors or omissions, or for damages resulting from
>>> +.\" the use of the information contained herein.  The author(s) may not
>>> +.\" have taken the same level of care in the production of this manual,
>>> +.\" which is licensed free of charge, as they might when working
>>> +.\" professionally.
>>> +.\"
>>> +.\" Formatted or processed versions of this manual, if unaccompanied by
>>> +.\" the source, must acknowledge the copyright and authors of this work.
>>> +.\" %%%LICENSE_END
>>> +.\"
>>> +.TH CLOSE_RANGE 2 2020-12-08 "Linux" "Linux Programmer's Manual"
>>> +.SH NAME
>>> +close_range \- close all file descriptors in a given range
>>> +.SH SYNOPSIS
>>> +.nf
>>> +.B #include <linux/close_range.h>
>>> +.PP
>>> +.BI "int close_range(unsigned int " first ", unsigned int " last ,
>>> +.BI "                unsigned int " flags );
>>> +.fi
>>> +.PP
>>> +.IR Note :
>>> +There is no glibc wrapper for this system call; see NOTES.
>>> +.SH DESCRIPTION
>>> +The
>>> +.BR close_range ()
>>> +system call closes all open file descriptors from
>>> +.I first
>>> +to
>>> +.I last
>>> +(included).
>>> +.PP
>>> +Errors closing a given file descriptor are currently ignored.
>>> +.PP
>>> +.I flags
>>> +can be 0 or set to one or both of the following:
>>> +.TP
>>> +.B CLOSE_RANGE_UNSHARE
>>> +unshares the range of file descriptors from any other processes,
>>> +before closing them,
>>> +avoiding races with other threads sharing the file descriptor table.
>>> +.TP
>>> +.BR CLOSE_RANGE_CLOEXEC " (since Linux 5.10)"  
>>
>> |sort
>>
>> I prefer alphabetic order rather than adding new items at the bottom.
>> When lists grow, it becomes difficult to find what you're looking for.
>>
>> CLOEXEC should go before UNSHARE.
> 
> That makes sense.
> 
>>> +sets the close-on-exec bit instead of immediately closing the file
>>> +descriptors.  
>>
>> [
>> sets the close-on-exec bit instead of
>> immediately closing the file descriptors.
>> ]
> 
> Is this for semantic reasons, or to balance the lines and make them easier to
> read in the roff source?

B is also true, but mostly A.

Cheers,

Alex

> 
>>> +.SH RETURN VALUE
>>> +On success,
>>> +.BR close_range ()
>>> +returns 0.
>>> +On error, \-1 is returned and
>>> +.I errno
>>> +is set to indicate the cause of the error.
>>> +.SH ERRORS
>>> +.TP
>>> +.B EINVAL
>>> +.I flags
>>> +is not valid, or
>>> +.I first
>>> +is greater than
>>> +.IR last .
>>> +.PP
>>> +The following can occur with
>>> +.B CLOSE_RANGE_UNSHARE
>>> +(when constructing the new descriptor table):
>>> +.TP
>>> +.B EMFILE
>>> +The per-process limit on the number of open file descriptors has been
>>> reached +(see the description of
>>> +.B RLIMIT_NOFILE
>>> +in
>>> +.BR getrlimit (2)).
>>> +.TP
>>> +.B ENOMEM
>>> +Insufficient kernel memory was available.
>>> +.SH VERSIONS
>>> +.BR close_range ()
>>> +first appeared in Linux 5.9.
>>> +.SH CONFORMING TO
>>> +.BR close_range ()
>>> +is a nonstandard function that is also present on FreeBSD.
>>> +.SH NOTES
>>> +Glibc does not provide a wrapper for this system call; call it using
>>> +.BR syscall (2).
>>> +.\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
>>> +.SS Closing all open file descriptors  
>>
>> The comment with the commit would be better inside the section it refers
>> to, so:
>>
>> [
>> .SS Closing all open file descriptors
>> .\" 278a5fbaed89dacd04e9d052f4594ffd0e0585de
>> ]
> 
> Indeed!
> 
>>> +To avoid blindly closing file descriptors in the range of possible
>>> +file descriptors,  
>>
>> [
>> To avoid blindly closing file descriptors
>> in the range of possible file descriptors,
>> ]
>>
>>> +this is sometimes implemented (on Linux) by listing open file
>>> +descriptors in  
>>
>> [
>> this is sometimes implemented (on Linux)
>> by listing open file descriptors in
>> ]
>>
>>> +.I /proc/self/fd/
>>> +and calling
>>> +.BR close (2)
>>> +on each one.
>>> +.BR close_range ()
>>> +can take care of this without requiring
>>> +.I /proc
>>> +and with a single system call,  
>>
>> s/with/within/
>>
>>> +which provides significant performance benefits.
>>> +.\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
>>> +.SS Closing file descriptors before exec  
>>
>> [
>> .SS Closing file descriptors before exec
>> .\" 60997c3d45d9a67daf01c56d805ae4fec37e0bd8
>> ]
>>
>>> +File descriptors can be closed safely using
>>> +.PP
>>> +.in +4n
>>> +.EX
>>> +/* we don't want anything past stderr here */
>>> +close_range(3, ~0U, CLOSE_RANGE_UNSHARE);  
>>> +execve(....);> +.EE  
>>> +.in
>>> +.PP
>>> +.B CLOSE_RANGE_UNSHARE
>>> +is conceptually equivalent to
>>> +.PP
>>> +.in +4n
>>> +.EX
>>> +unshare(CLONE_FILES);
>>> +close_range(first, last, 0);
>>> +.EE
>>> +.in
>>> +.PP
>>> +but can be more efficient:
>>> +if the unshared range extends past the current maximum number of file
>>> +descriptors allocated in the caller's file descriptor table  
>>
>> [
>> if the unshared range extends past
>> the current maximum number of file descriptors allocated
>> in the caller's file descriptor table
>> ]
>>
>>> +(the common case when
>>> +.I last
>>> +is
>>> +.BR ~0U ),  
>>
>> Literal values are not (usually) formatted.
>>
>> [
>> .I last
>> is ~0U),
>> ]
>>
>>> +the kernel will unshare a new file descriptor table for the caller up
>>> +to  
>>
>> [
>> the kernel will unshare a new file descriptor table for the caller up to
>> ]
>>
>>> +.IR first .
>>> +This avoids subsequent close calls entirely;
>>> +the whole operation is complete once the table is unshared.
>>> +.\" 582f1fb6b721facf04848d2ca57f34468da1813e
>>> +.SS Closing files on \fBexec\fP  
>>
>> [
>> .SS Closing files on \fBexec\fP
>> .\" 582f1fb6b721facf04848d2ca57f34468da1813e
>> ]
>>
>>> +This is particularly useful in cases where multiple
>>> +.RB pre- exec
>>> +setup steps risk conflicting with each other.
>>> +For example, setting up a
>>> +.BR seccomp (2)
>>> +profile can conflict with a
>>> +.B close_range  
>>
>> .BR close_range ()
>>
>>> +call:
>>> +if the file descriptors are closed before the seccomp profile is set  
>>
>> .BR seccomp (2)
>>
>>> +up,  
>>
>> Please, split at a different point.
>>
>>> +the profile setup can't use them control their closure;  
>>
>> I don't understand what you wanted to say.  them?
> 
> Oops, I meant "the profile setup can't use them itself, or control their
> closure".
> 
>>
>>> +if the file descriptors are closed afterwards,
>>> +the seccomp profile can't block the
>>> +.B close_range  
>>
>> .BR close_range ()
>>
>>> +call or any fallbacks.
>>> +Using
>>> +.B CLOSE_RANGE_CLOEXEC
>>> +avoids this:
>>> +the descriptors can be marked before the seccomp profile is set up,  
>>
>> .BR seccomp (2)
>>
>>> +and the profile can control access to
>>> +.B close_range  
>>
>> .BR close_range ()
>>
>>> +without affecting the calling process.
>>> +.SH EXAMPLES
>>> +The following program is designed to be execed by the second program
>>> +below.
>>> +It lists its open file descriptors:
>>> +.PP
>>> +.in +4n
>>> +.EX
>>> +/* listopen.c */
>>> +
>>> +#include <stdio.h>
>>> +#include <sys/stat.h>
>>> +
>>> +int
>>> +main(int argc, char *argv[])
>>> +{
>>> +    int i;  
>>
>> We use C99 declarations for loop indices.
>>
>>> +    struct stat buf;
>>> +
>>> +    for (i = 0; i < 100; i++) {  
>>
>>     for (int i = 0; i < 100; i++) {
>>
>>> +        if (!fstat(i, &buf))
>>> +            printf("FD %d is open.\n", i);  
>>
>> s/\\/\\e/
>>
>> see: d1a719857b7eb68f5e5c1c965089038dee683240
>>
>> I sometimes forget to fix those after copying the program to the page.
>> My solution is to copy the rendered text from the man page to a file
>> and then compile, and those errors become obvious ;)
> 
> Ah yes, good catch. I was looking into automating checks for the source code
> included in man pages throughout the project, but that throws a spanner in
> the works!
> 
>>
>>> +    }
>>> +
>>> +    exit(EXIT_SUCCESS);
>>> +)
>>> +.EE
>>> +.in
>>> +.PP
>>> +This program executes the command given on its command-line after
>>> +opening the files listed after the command,
>>> +and then using  
>>
>> s/using/uses/
>>
>>> +.B close_range  
>>
>> .BR close_range ()
>>
>>> +to close them:
>>> +.PP
>>> +.in +4n
>>> +.EX
>>> +/* close_range.c */
>>> +
>>> +#include <fcntl.h>
>>> +#include <linux/close_range.h>
>>> +#include <stdio.h>
>>> +#include <stdlib.h>
>>> +#include <sys/stat.h>
>>> +#include <sys/syscall.h>
>>> +#include <sys/types.h>
>>> +#include <unistd.h>
>>> +
>>> +int
>>> +main(int argc, char *argv[])
>>> +{
>>> +    char *newargv[] = { NULL };
>>> +    char *newenviron[] = { NULL };
>>> +    int i;  
>>
>> dd
>>
>>> +
>>> +    if (argc < 3) {
>>> +        fprintf(stderr, "Usage: %s <command-to-run> <files-to-open>\n",
>>> argv[0]);  
>>
>> s/\\/\\e/
>>
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    for (i = 2; i < argc; i++) {  
>>
>>     for (int i = 2; i < argc; i++) {
>>
>>> +        if (open(argv[i], O_RDONLY) == -1) {
>>> +            perror(argv[i]);
>>> +            exit(EXIT_FAILURE);
>>> +        }
>>> +    }
>>> +
>>> +    if (syscall(__NR_close_range, 3, ~0U, CLOSE_RANGE_UNSHARE) == -1) {
>>> +        perror("close_range");
>>> +        exit(EXIT_FAILURE);
>>> +    }
>>> +
>>> +    execve(argv[1], newargv, newenviron);
>>> +    perror("execve");
>>> +    exit(EXIT_FAILURE);
>>> +}
>>> +.EE
>>> +.in
>>> +.PP
>>> +We can use the second program to exec the first as follows:
>>> +.PP
>>> +.in +4n
>>> +.EX
>>> +.RB "$" " make listopen close_range"
>>> +.RB "$" " ./close_range ./listopen /dev/null /dev/zero"
>>> +FD 0 is open.
>>> +FD 1 is open.
>>> +FD 2 is open.
>>> +.EE
>>> +.in
>>> +.PP
>>> +Removing the call to
>>> +.B close_range  
>>
>> .BR close_range ()
>>
>>> +will show different output, with the file descriptors for the named
>>> +files still open.  
>>
>> [
>> will show different output,
>> with the file descriptors for the named files still open.
>> ]
> 
> Thanks, I'll send a v4 with all the fixes above.
> 
> Regards,
> 
> Stephen
> 

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ