[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6c32a7c3-4bed-8d5e-134f-47a4bd49dc78@gmail.com>
Date: Fri, 18 Dec 2020 11:12:10 +0100
From: "Alejandro Colomar (man-pages)" <alx.manpages@...il.com>
To: Stephen Kitt <steve@....org>
Cc: linux-man@...r.kernel.org,
Michael Kerrisk <mtk.manpages@...il.com>,
linux-kernel@...r.kernel.org,
Christian Brauner <christian.brauner@...ntu.com>
Subject: Ping: [patch] close_range.2: new page documenting close_range(2)
Hi Stephen,
Linux 5.10 has been recently released.
Do you have any updates for this patch?
Thanks,
Alex
On 12/12/20 6:58 PM, Alejandro Colomar (man-pages) wrote:
> Hi Christian,
>
> Makes sense to me.
>
> Thanks,
>
> Alex
>
> On 12/12/20 1:14 PM, Christian Brauner wrote:
>> On Thu, Dec 10, 2020 at 03:36:42PM +0100, Alejandro Colomar (man-pages) wrote:
>>> Hi Christian,
>>
>> Hi Alex,
>>
>>>
>>> Thanks for confirming that behavior. Seems reasonable.
>>>
>>> I was wondering...
>>> If this call is equivalent to unshare(2)+{close(2) in a loop},
>>> shouldn't it fail for the same reasons those syscalls can fail?
>>>
>>> What about the following errors?:
>>>
>>> From unshare(2):
>>>
>>> EPERM The calling process did not have the required privi‐
>>> leges for this operation.
>>
>> unshare(CLONE_FILES) doesn't require any privileges. Only flags relevant
>> to kernel/nsproxy.c:unshare_nsproxy_namespaces() require privileges,
>> i.e.
>> CLONE_NEWNS
>> CLONE_NEWUTS
>> CLONE_NEWIPC
>> CLONE_NEWNET
>> CLONE_NEWPID
>> CLONE_NEWCGROUP
>> CLONE_NEWTIME
>> so the permissions are the same.
>>
>>>
>>> From close(2):
>>> EBADF fd isn't a valid open file descriptor.
>>>
>>> OK, this one can't happen with the current code.
>>> Let's say there are fds 1 to 10, and you call 'close_range(20,30,0)'.
>>> It's a no-op (although it will still unshare if the flag is set).
>>> But souldn't it fail with EBADF?
>>
>> CLOSE_RANGE_UNSHARE should always give you a private file descriptor
>> table independent of whether or not any file descriptors need to be
>> closed. That's also how we documented the flag:
>>
>> /* Unshare the file descriptor table before closing file descriptors. */
>> #define CLOSE_RANGE_UNSHARE (1U << 1)
>>
>> A caller calling unshare(CLONE_FILES) and then an emulated close_range()
>> or the proper close_range() syscall wants to make sure that all unwanted
>> file descriptors are closed (if any) and that no new file descriptors
>> can be injected afterwards. If you skip the unshare(CLONE_FILES) because
>> there are no fds to be closed you open up a race window. It would also
>> be annoying for userspace if they _may_ have received a private file
>> descriptor table but only if any fds needed to be closed.
>>
>> If people really were extremely keen about skipping the unshare when no
>> fd needs to be closed then this could become a new flag. But I really
>> don't think that's necessary and also doesn't make a lot of sense, imho.
>>
>>>
>>> EINTR The close() call was interrupted by a signal; see sig‐
>>> nal(7).
>>>
>>> EIO An I/O error occurred.
>>>
>>> ENOSPC, EDQUOT
>>> On NFS, these errors are not normally reported against
>>> the first write which exceeds the available storage
>>> space, but instead against a subsequent write(2),
>>> fsync(2), or close().
>>
>> None of these will be seen by userspace because close_range() currently
>> ignores all errors after it has begun closing files.
>>
>> Christian
>>
--
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/
Powered by blists - more mailing lists