[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48e5937b-80f5-c48b-1c67-e8c9db263ca5@gmail.com>
Date: Thu, 29 Oct 2020 21:37:21 +0100
From: "Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>
To: Sargun Dhillon <sargun@...gun.me>
Cc: mtk.manpages@...il.com, Tycho Andersen <tycho@...ho.pizza>,
Christian Brauner <christian@...uner.io>,
Kees Cook <keescook@...omium.org>,
Daniel Borkmann <daniel@...earbox.net>,
Giuseppe Scrivano <gscrivan@...hat.com>,
Song Liu <songliubraving@...com>,
Robert Sesek <rsesek@...gle.com>,
Containers <containers@...ts.linux-foundation.org>,
linux-man <linux-man@...r.kernel.org>,
lkml <linux-kernel@...r.kernel.org>,
Aleksa Sarai <cyphar@...har.com>, Jann Horn <jannh@...gle.com>,
Alexei Starovoitov <ast@...nel.org>,
Will Drewry <wad@...omium.org>, bpf <bpf@...r.kernel.org>,
Andy Lutomirski <luto@...capital.net>
Subject: Re: For review: seccomp_user_notif(2) manual page [v2]
Hello Sargun,,
On 10/29/20 9:53 AM, Sargun Dhillon wrote:
> On Mon, Oct 26, 2020 at 10:55:04AM +0100, Michael Kerrisk (man-pages) wrote:
[...]
>> ioctl(2) operations
>> The following ioctl(2) operations are provided to support seccomp
>> user-space notification. For each of these operations, the first
>> (file descriptor) argument of ioctl(2) is the listening file
>> descriptor returned by a call to seccomp(2) with the
>> SECCOMP_FILTER_FLAG_NEW_LISTENER flag.
>>
>> SECCOMP_IOCTL_NOTIF_RECV
>> This operation is used to obtain a user-space notification
>> event. If no such event is currently pending, the
>> operation blocks until an event occurs. The third
>> ioctl(2) argument is a pointer to a structure of the
>> following form which contains information about the event.
>> This structure must be zeroed out before the call.
>>
>> struct seccomp_notif {
>> __u64 id; /* Cookie */
>> __u32 pid; /* TID of target thread */
>> __u32 flags; /* Currently unused (0) */
>> struct seccomp_data data; /* See seccomp(2) */
>> };
>>
>> The fields in this structure are as follows:
>>
>> id This is a cookie for the notification. Each such
>> cookie is guaranteed to be unique for the
>> corresponding seccomp filter.
>>
>> · It can be used with the
>> SECCOMP_IOCTL_NOTIF_ID_VALID ioctl(2) operation
>> to verify that the target is still alive.
>>
>> · When returning a notification response to the
>> kernel, the supervisor must include the cookie
>> value in the seccomp_notif_resp structure that is
>> specified as the argument of the
>> SECCOMP_IOCTL_NOTIF_SEND operation.
>>
>> pid This is the thread ID of the target thread that
>> triggered the notification event.
>>
>> flags This is a bit mask of flags providing further
>> information on the event. In the current
>> implementation, this field is always zero.
>>
>> data This is a seccomp_data structure containing
>> information about the system call that triggered
>> the notification. This is the same structure that
>> is passed to the seccomp filter. See seccomp(2)
>> for details of this structure.
>>
>> On success, this operation returns 0; on failure, -1 is
>> returned, and errno is set to indicate the cause of the
>> error. This operation can fail with the following errors:
>>
>> EINVAL (since Linux 5.5)
>> The seccomp_notif structure that was passed to the
>> call contained nonzero fields.
>>
>> ENOENT The target thread was killed by a signal as the
>> notification information was being generated, or
>> the target's (blocked) system call was interrupted
>> by a signal handler.
>>
>> ┌─────────────────────────────────────────────────────┐
>> │FIXME │
>> ├─────────────────────────────────────────────────────┤
>> │From my experiments, it appears that if a │
>> │SECCOMP_IOCTL_NOTIF_RECV is done after the target │
>> │thread terminates, then the ioctl() simply blocks │
>> │(rather than returning an error to indicate that the │
>> │target no longer exists). │
>> │ │
>> │I found that surprising, and it required some │
>> │contortions in the example program. It was not │
>> │possible to code my SIGCHLD handler (which reaps the │
>> │zombie when the worker/target terminates) to simply │
>> │set a flag checked in the main handleNotifications() │
>> │loop, since this created an unavoidable race where │
>> │the child might terminate just after I had checked │
>> │the flag, but before I blocked (forever!) in the │
>> │SECCOMP_IOCTL_NOTIF_RECV operation. Instead, I had │
>> │to code the signal handler to simply call _exit(2) │
>> │in order to terminate the parent process (the │
>> │supervisor). │
>> │ │
>> │Is this expected behavior? It seems to me rather │
>> │desirable that SECCOMP_IOCTL_NOTIF_RECV should give │
>> │an error if the target has terminated. │
>> │ │
>> │Jann posted a patch to rectify this, but there was │
>> │no response (Lore link: https://bit.ly/3jvUBxk) to │
>> │his question about fixing this issue. (I've tried │
>> │building with the patch, but encountered an issue │
>> │with the target process entering D state after a │
>> │signal.) │
>> │ │
>> │For now, this behavior is documented in BUGS. │
>> │ │
>> │Kees Cook commented: Let's change [this] ASAP! │
>> └─────────────────────────────────────────────────────┘
>>
>
> I think I commented in another thread somewhere that the supervisor is not
> notified if the syscall is preempted. Therefore if it is performing a
> preemptible, long-running syscall, you need to poll
> SECCOMP_IOCTL_NOTIF_ID_VALID in the background, otherwise you can
> end up in a bad situation -- like leaking resources, or holding on to
> file descriptors after the program under supervision has intended to
> release them.
It's been a long day, and I'm not sure I reallu understand this.
Could you outline the scnario in more detail?
> A very specific example is if you're performing an accept on behalf
> of the program generating the notification, and the program intends
> to reuse the port. You can get into all sorts of awkward situations
> there.
[...]
> SECCOMP_IOCTL_NOTIF_ADDFD (Since Linux v5.9)
> This operations is used by the supervisor to add a file
> descriptor to the process that generated the notification.
> This can be used by the supervisor to enable "emulation"
> [Probably a better word] of syscalls which return file
> descriptors, such as socket(2), or open(2).
>
> When the file descriptor is received by the process that
> is associated with the notification / cookie, it follows
> SCM_RIGHTS like semantics, and is evaluated by MAC.
I'm not sure what you mean by SCM_RIGHTS like semantics. Do you mean,
the file descriptor refers to the same open file description
('struct file')?
"is evaluated by MAC"... Do you mean something like: the FD is
subject to LSM checks?
> In addition, if it is a socket, it inherits the cgroup
> v1 classid and netprioidx of the receiving process.
>
> The argument of this is as follows:
>
> struct seccomp_notif_addfd {
> __u64 id;
> __u32 flags;
> __u32 srcfd;
> __u32 newfd;
> __u32 newfd_flags;
> };
>
> id
> This is the cookie value that was obtained using
> SECCOMP_IOCTL_NOTIF_RECV.
>
> flags
> A bitmask that includes zero or more of the
> SECCOMP_ADDFD_FLAG_* bits set
>
> SECCOMP_ADDFD_FLAG_SETFD - Use dup2 (or dup3?)
> like semantics when copying the file
> descriptor.
>
> srcfd
> The file descriptor number to copy in the
> supervisor process.
>
> newfd
> If the SECCOMP_ADDFD_FLAG_SETFD flag is specified
> this will be the file descriptor that is used
> in the dup2 semantics. If this file descriptor
> exists in the receiving process, it is closed
> and replaced by this file descriptor in an
> atomic fashion. If the copy process fails
> due to a MAC failure, or if srcfd is invalid,
> the newfd will not be closed in the receiving
> process.
Great description!
> If SECCOMP_ADDFD_FLAG_SETFD it not set, then
> this value must be 0.
>
> newfd_flags
> The file descriptor flags to set on
> the file descriptor after it has been received
> by the process. The only flag that can currently
> be specified is O_CLOEXEC.
>
> On success, this operation returns the file descriptor
> number in the receiving process. On failure, -1 is returned.
>
> It can fail with the following error codes:
>
> EINPROGRESS
> The cookie number specified hasn't been received
> by the listener
I don't understand this. Can you say more about the scenario?
> ENOENT
> The cookie number is not valid. This can happen
> if a response has already been sent, or if the
> syscall was interrupted
>
> EBADF
> If the file descriptor specified in srcfd is
> invalid, or if the fd is out of range of the
> destination program.
The piece "or if the fd is out of range of the destination
program" is not clear to me. Can you say some more please.
> EINVAL
> If flags or new_flags were unrecognized, or
> if newfd is non-zero, and SECCOMP_ADDFD_FLAG_SETFD
> has not been set.
>
> EMFILE
> Too many files are open by the destination process.
>
> [there's other error codes possible, like from the LSMs
> or if memory can't be read / written or ebusy]
>
> Does this help?
It's a good start!
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
Powered by blists - more mailing lists