linux-kernel - Re: [PATCH 1/2] fs/exec: allow to unshare a time namespace on vfork+exec

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220615085350.theicffhehgbmfep@wittgenstein>
Date:   Wed, 15 Jun 2022 10:53:50 +0200
From:   Christian Brauner <brauner@...nel.org>
To:     Florian Weimer <fweimer@...hat.com>
Cc:     Kees Cook <keescook@...omium.org>, Andrei Vagin <avagin@...il.com>,
        linux-kernel@...r.kernel.org,
        Dmitry Safonov <0x7f454c46@...il.com>, linux-mm@...ck.org,
        Eric Biederman <ebiederm@...ssion.com>
Subject: Re: [PATCH 1/2] fs/exec: allow to unshare a time namespace on
 vfork+exec

On Wed, Jun 15, 2022 at 10:14:19AM +0200, Florian Weimer wrote:
> * Christian Brauner:
> 
> > For pid namespaces one problem would be that it could end up confusing a
> > process about its own pid. This was a more serious problem when the pid
> > cache was still active in glibc; but fwiw systemd still has a pid cache
> > afair.
> 
> Right.  glibc still has a TID cache, mainly for use with recursive
> mutexes (where we need a 32-bit thread identifier and can't perform a
> system call on every locking operation for performance reasons).
> Assuming that a non-delayed CLONE_NEWPID would also change the TID
> underneath us, we'd have subtly broken recursive mutexes.

Fwiw, you can't call CLONE_NEWPID with CLONE_THREAD. This guarantees
that threads can send signals to each other and all threads within the
same threadgroup can be reached via proc. It'd be awkward if you'd have
a thread whose thread-group leader lives in an ancestor pidns.

Even if you'd make whole threadgroup change pid namespaces immediately
it would mean allocating new TGID and TIDs in the new pid namespaces -
unless they are accidently not already allocated.

> 
> vfork gets away with not updating the TID cache (which is shared with
> the parent process) because the parent process is suspended while the
> new subprocess is still running and has not execve'ed yet.
> 
> Now one could argue that calling unshare automatically means that you
> must not call any glibc functions afterwards (similar to thread-creating
> clone), or at least that you cannot call any functions which are not
> async-signal-safe, but that does not match existing application
> practice.  And I think we actually prefer that file servers call chroot

Yeah, that'd be a rather subtle and risky change for pid namespaces.