[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANaxB-xqpZcVObpGCBsBXNVN7a2CZ7=_CaPZp4mG50Bi0oVDmA@mail.gmail.com>
Date: Tue, 30 Aug 2022 18:18:18 -0700
From: Andrei Vagin <avagin@...il.com>
To: Alexey Izbyshev <izbyshev@...ras.ru>
Cc: linux-kernel@...r.kernel.org,
Dmitry Safonov <0x7f454c46@...il.com>,
Christian Brauner <brauner@...nel.org>,
Florian Weimer <fweimer@...hat.com>, linux-mm@...ck.org,
Eric Biederman <ebiederm@...ssion.com>,
Kees Cook <keescook@...omium.org>
Subject: Re: Potentially undesirable interactions between vfork() and time namespaces
On Tue, Aug 30, 2022 at 12:49 PM Alexey Izbyshev <izbyshev@...ras.ru> wrote:
>
> Hi,
>
> I've looked at Andrei's patch[1] that permitted vfork() after
> unshare(CLONE_NEWTIME) and noticed a couple of odd things that I'd like
> to point out.
>
> /*
> * If the new process will be in a different time namespace
> * do not allow it to share VM or a thread group with the forking
> task.
> + *
> + * On vfork, the child process enters the target time namespace only
> + * after exec.
> */
> - if (clone_flags & (CLONE_THREAD | CLONE_VM)) {
> + if ((clone_flags & (CLONE_VM | CLONE_VFORK)) == CLONE_VM) {
> if (nsp->time_ns != nsp->time_ns_for_children)
> return ERR_PTR(-EINVAL);
> }
>
> This change permits not only a normal vfork(), but also
> clone(CLONE_VM|CLONE_VFORK|CLONE_SIGHAND|CLONE_THREAD). I'm not sure
> whether it can cause real harm, but it's pretty inconsistent to forbid
> creation of normal threads after unshare(CLONE_NEWTIME), but permit such
> weird ones, so maybe the check should be strengthened.
Good catch. I was not aware that CLONE_VFORK is allowed to be used with
CLONE_THREAD. I will send a fix. Thanks.
>
> Also, if such a thread execs, no time namespace switch will happen
> because it's vfork_done field will be cleared when its creator (a
> sibling thread) is killed by de_thread().
>
> + vfork = !!tsk->vfork_done;
> old_mm = current->mm;
> exec_mm_release(tsk, old_mm);
> if (old_mm)
> @@ -1030,6 +1033,10 @@ static int exec_mmap(struct mm_struct *mm)
> tsk->mm->vmacache_seqnum = 0;
> vmacache_flush(tsk);
> task_unlock(tsk);
> +
> + if (vfork)
> + timens_on_fork(tsk->nsproxy, tsk);
> +
>
> Similarly, even after a normal vfork(), time namespace switch could be
> silently skipped if the parent dies before "tsk->vfork_done" is read.
> Again, I don't know whether anybody cares, but this behavior seems
> non-obvious and probably unintended to me.
This is the more interesting case. I will try to find out how we can
handle it properly.
Thanks,
Andrei
Powered by blists - more mailing lists