[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202505140822.6AB755B6@keescook>
Date: Wed, 14 May 2025 08:42:22 -0700
From: Kees Cook <kees@...nel.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: "Eric W. Biederman" <ebiederm@...ssion.com>,
Jann Horn <jannh@...gle.com>,
Christian Brauner <brauner@...nel.org>,
Jorge Merlino <jorge.merlino@...onical.com>,
Alexander Viro <viro@...iv.linux.org.uk>,
Thomas Gleixner <tglx@...utronix.de>,
Andy Lutomirski <luto@...nel.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org,
John Johansen <john.johansen@...onical.com>,
Paul Moore <paul@...l-moore.com>, James Morris <jmorris@...ei.org>,
"Serge E. Hallyn" <serge@...lyn.com>,
Stephen Smalley <stephen.smalley.work@...il.com>,
Eric Paris <eparis@...isplace.org>,
Richard Haines <richard_c_haines@...nternet.com>,
Casey Schaufler <casey@...aufler-ca.com>,
Xin Long <lucien.xin@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Todd Kjos <tkjos@...gle.com>, Ondrej Mosnacek <omosnace@...hat.com>,
Prashanth Prahlad <pprahlad@...hat.com>,
Micah Morton <mortonm@...omium.org>,
Fenghua Yu <fenghua.yu@...el.com>, Andrei Vagin <avagin@...il.com>,
linux-kernel@...r.kernel.org, apparmor@...ts.ubuntu.com,
linux-security-module@...r.kernel.org, selinux@...r.kernel.org,
linux-hardening@...r.kernel.org, oleg@...hat.com
Subject: Re: [PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec
On Wed, May 14, 2025 at 02:03:31AM +0200, Mateusz Guzik wrote:
> On Wed, May 14, 2025 at 12:17 AM Eric W. Biederman
> <ebiederm@...ssion.com> wrote:
> >
> > Jann Horn <jannh@...gle.com> writes:
> >
> > > On Tue, May 13, 2025 at 10:57 PM Kees Cook <kees@...nel.org> wrote:
> > >> On May 13, 2025 6:05:45 AM PDT, Mateusz Guzik <mjguzik@...il.com> wrote:
> > >> >Here is my proposal: *deny* exec of suid/sgid binaries if fs_struct is
> > >> >shared. This will have to be checked for after the execing proc becomes
> > >> >single-threaded ofc.
> > >>
> > >> Unfortunately the above Chrome helper is setuid and uses CLONE_FS.
> > >
> > > Chrome first launches a setuid helper, and then the setuid helper does
> > > CLONE_FS. Mateusz's proposal would not impact this usecase.
> > >
> > > Mateusz is proposing to block the case where a process first does
> > > CLONE_FS, and *then* one of the processes sharing the fs_struct does a
> > > setuid execve(). Linux already downgrades such an execve() to be
> > > non-setuid, which probably means anyone trying to do this will get
> > > hard-to-understand problems. Mateusz' proposal would just turn this
> > > hard-to-debug edgecase, which already doesn't really work, into a
> > > clean error; I think that is a nice improvement even just from the
> > > UAPI standpoint.
> > >
> > > If this change makes it possible to clean up the kernel code a bit, even better.
> >
> > What has brought this to everyone's attention just now? This is
> > the second mention of this code path I have seen this week.
> >
>
> There is a syzkaller report concerning ->in_exec handling, for example:
> https://lore.kernel.org/all/67dc67f0.050a0220.25ae54.001f.GAE@google.com/#t
>
> [...]
> > It looks like most of the lsm's also test bprm->unsafe.
> >
> > So I imagine someone could very carefully separate the non-ptrace case
> > from the ptrace case but *shrug*.
> >
> > Perhaps:
> >
> > if ((is_setid || __cap_gained(permitted, new_old)) &&
> > ((bprm->unsafe & ~LSM_UNSAFE_PTRACE) ||
> > !ptracer_capable(current, new->user_ns))) {
> > + if (!(bprm->unsafe & LSM_UNSAFE_PTRACE)) {
> > + return -EPERM;
> > + }
> > /* downgrade; they get no more than they had, and maybe less */
> > if (!ns_capable(new->user_ns, CAP_SETUID) ||
> > (bprm->unsafe & LSM_UNSAFE_NO_NEW_PRIVS)) {
> > new->euid = new->uid;
> > new->egid = new->gid;
> > }
> > new->cap_permitted = cap_intersect(new->cap_permitted,
> > old->cap_permitted);
> > }
> >
> > If that is what you want that doesn't look to scary. I don't think
> > it simplifies anything about fs->in_exec. As fs->in_exec is set when
> > the processing calling exec is the only process that owns the fs_struct.
> > With fs->in_exec just being a flag that doesn't allow another thread
> > to call fork and start sharing the fs_struct during exec.
> >
> > *Shrug*
> >
> > I don't see why anyone would care. It is just a very silly corner case.
>
> Well I don't see how ptrace factors into any of this, apart from being
> a different case of ignoring suid/sgid.
I actually think we might want to expand the above bit of logic to use
an explicit tests of each LSM_UNSAFE case -- the merged
logic is very difficult to read currently. Totally untested expansion,
if I'm reading everything correctly:
if (bprm->unsafe &&
(is_setid || __cap_gained(permitted, new_old))) {
bool limit_caps = false;
bool strip_eid = false;
unsigned int unsafe = bprm->unsafe;
/* Check each bit */
if (unsafe & LSM_UNSAFE_PTRACE) {
if (!ptracer_capable(current, new->user_ns))
limit_caps = true;
unsafe &= ~LSM_UNSAFE_PTRACE;
}
if (unsafe & LSM_UNSAFE_SHARE) {
limit_caps = true;
if (!ns_capable(new->user_ns, CAP_SETUID))
strip_eid = true;
unsafe &= ~LSM_UNSAFE_SHARE;
}
if (unsafe & LSM_UNSAFE_NO_NEW_PRIVS) {
limit_caps = true;
if (!ns_capable(new->user_ns, CAP_SETUID))
strip_eid = true;
unsafe &= ~LSM_UNSAFE_NO_NEW_PRIVS;
}
if (WARN(unsafe, "Unhandled LSM_UNSAFE flag: %u?!\n", unsafe))
return -EINVAL;
if (limit_caps) {
new->cap_permitted = cap_intersect(new->cap_permitted,
old->cap_permitted);
}
if (strip_eid) {
new->euid = new->uid;
new->egid = new->gid;
}
}
> I can agree the suid/sgid situation vs CLONE_FS is a silly corner
> case, but one which needs to be handled for security reasons and which
> currently has weirdly convoluted code to do it.
>
> The intent behind my proposal is very much to get the crapper out of
> the way in a future-proof and simple manner.
>
> In check_unsafe_exec() you can find a nasty loop over threads in the
> group to find out if the fs struct is used by anyone outside of said
> group. Since fs struct users are not explicitly tracked and any of
> them can have different creds than the current thread, the kernel opts
> to ignore suid/sgid if there are extra users found (for security
> reasons). The loop depends on no new threads showing up as the list is
> being walked, to that end copy_fs() can transiently return an error if
> it spots ->in_exec.
>
> The >in_exec field is used as a boolean/flag, but parallel execs using
> the same fs struct from different thread groups don't look serialized.
> This is supposed to be fine as in this case ->in_exec is not getting
> set to begin with, but it gets unconditionally unset on all execs.
>
> And so on. It's all weird af.
100% :)
--
Kees Cook
Powered by blists - more mailing lists