linux-kernel - Re: [PATCH 1/2] fs/exec: allow to unshare a time namespace on vfork+exec

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87zgiepcmc.fsf@oldenburg.str.redhat.com>
Date:   Wed, 15 Jun 2022 10:14:19 +0200
From:   Florian Weimer <fweimer@...hat.com>
To:     Christian Brauner <brauner@...nel.org>
Cc:     Kees Cook <keescook@...omium.org>, Andrei Vagin <avagin@...il.com>,
        linux-kernel@...r.kernel.org,
        Dmitry Safonov <0x7f454c46@...il.com>, linux-mm@...ck.org,
        Eric Biederman <ebiederm@...ssion.com>
Subject: Re: [PATCH 1/2] fs/exec: allow to unshare a time namespace on
 vfork+exec

* Christian Brauner:

> For pid namespaces one problem would be that it could end up confusing a
> process about its own pid. This was a more serious problem when the pid
> cache was still active in glibc; but fwiw systemd still has a pid cache
> afair.

Right.  glibc still has a TID cache, mainly for use with recursive
mutexes (where we need a 32-bit thread identifier and can't perform a
system call on every locking operation for performance reasons).
Assuming that a non-delayed CLONE_NEWPID would also change the TID
underneath us, we'd have subtly broken recursive mutexes.

vfork gets away with not updating the TID cache (which is shared with
the parent process) because the parent process is suspended while the
new subprocess is still running and has not execve'ed yet.

Now one could argue that calling unshare automatically means that you
must not call any glibc functions afterwards (similar to thread-creating
clone), or at least that you cannot call any functions which are not
async-signal-safe, but that does not match existing application
practice.  And I think we actually prefer that file servers call chroot
after unshare(CLONE_FS), rather than trying to reimplement restricted
pathname lookup in userspace.

Thanks,
Florian