[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202411211302.08EEE6D395@keescook>
Date: Thu, 21 Nov 2024 14:06:00 -0800
From: Kees Cook <kees@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Zbigniew Jędrzejewski-Szmek <zbyszek@...waw.pl>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
linux-kernel@...r.kernel.org,
Alexander Viro <viro@...iv.linux.org.uk>,
Christophe JAILLET <christophe.jaillet@...adoo.fr>,
Dan Carpenter <dan.carpenter@...aro.org>,
Nir Lichtman <nir@...htman.org>,
syzbot+03e1af5c332f7e0eb84b@...kaller.appspotmail.com,
Tycho Andersen <tandersen@...flix.com>,
Vegard Nossum <vegard.nossum@...cle.com>
Subject: Re: [GIT PULL] execve updates for v6.13-rc1
On Thu, Nov 21, 2024 at 11:23:46AM -0800, Linus Torvalds wrote:
> I'm done with this discussion that apparently was brought on by people
> not knowing what the hell they were doing.
This is disrespectful. If you're frustrated you can just say so. I'm
certainly frustrated.
> For user space, comm[] is basically the fallback for when cmdline
> fails for some reason (for example, /proc/*/cmdline will be empty for
> kworkers, but there are other situations too)
>
> The reason? comm[] has *always* been much too limited for 'ps' output. ALWAYS.
>
> Yes, you can literally *force* ps to not do that (eg "ps -eo comm")
> but if you do that, you get the very limited comm[] output that nobody
> has ever wanted ps to give exactly because it's so limited.
I think I finally figured out why you keep saying this. I think you mean
to imply "ps -e" (or similar), not "ps". Asking for more process details
("ps a", "ps -f", "ps -e", etc) uses cmdline. Without options that turn
on more details, the default is comm. If you run just "ps", it shows comm:
$ strace ps 2>&1 | grep /cmdline | wc -l
0
If you run with detail options it shows cmdline:
$ strace ps a 2>&1 | grep /cmdline | wc -l
1266
$ strace ps -f 2>&1 | grep /cmdline | wc -l
1266
This is procps-ng found on all Debian and Ubuntu systems. AFAICT
procps-ng is used on Fedora too.
Note I'm not saying comm is GOOD or anything. I'm just saying that it
IS regularly visible.
> And yes, 'top' will give comm[] output because it's so much faster.
Exactly. By default, top and ps both show comm, which in the vast
majority of cases is identical to argv[0]. I don't understand why this
is causing such angst -- it's just the observable facts: many things in
userspace expose comm and many also expose cmdline. Having them be
mismatched due to fexecve() is the whole issue.
Nobody blinks at:
$ ps
PID TTY TIME CMD
4125309 pts/1 00:00:47 bash
4171960 pts/1 00:00:00 bash
4171962 pts/1 00:00:00 vim
4171997 pts/1 00:00:00 ps
vs
$ ps -f
UID PID PPID C STIME TTY TIME CMD
kees 4125309 3947 0 Jul11 pts/1 00:00:47 -bash
kees 4171960 4125309 0 13:30 pts/1 00:00:00 -bash
kees 4171962 4171960 0 13:30 pts/1 00:00:00 vim
kees 4172004 4125309 0 13:30 pts/1 00:00:00 ps -f
But if fexecve() were used now, "ps" would show:
$ ps
PID TTY TIME CMD
4125309 pts/1 00:00:47 3
4171960 pts/1 00:00:00 3
4171962 pts/1 00:00:00 3
4171997 pts/1 00:00:00 3
This is obviously horrible.
Using f_path, we'd get close, but symlink destinations (or weird stuff
like "memfd:name-here") are shown:
$ realpath $(which vim)
/usr/bin/vim.basic
$ ps
PID TTY TIME CMD
4125309 pts/1 00:00:47 bash
4171960 pts/1 00:00:00 bash
4171962 pts/1 00:00:00 vim.basic
4171997 pts/1 00:00:00 ps
Using argv[0], we'd get the original output:
$ ps
PID TTY TIME CMD
4125309 pts/1 00:00:47 bash
4171960 pts/1 00:00:00 bash
4171962 pts/1 00:00:00 vim
4171997 pts/1 00:00:00 ps
IMO, switching to fexecve() shouldn't regress this basic piece of
information.
Now, I think we have three choices:
1) Leave it as-is. (comm is useless)
2) Use argv[0]. (comm matches what would show with execve() in most cases)
3) Use f_path (comm exposes f_path dentry name, which matches
basename(readlink(/proc/*/exe)), but doesn't always match what execve()
would show).
I think everyone agrees "1" should go away.
So it comes down to trying to stay looking more like execve()'s comm, or
more like /proc/*/exe's value.
Since comm is mutable anyway, I feel like the "friendlier" default for
userspace would be option 2.
If you still conclude differently, I guess the discussion is over, and
we go with 3:
diff --git a/fs/exec.c b/fs/exec.c
index e0435b31a811..8688bbbaf4af 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1347,7 +1347,21 @@ int begin_new_exec(struct linux_binprm * bprm)
set_dumpable(current->mm, SUID_DUMP_USER);
perf_event_exec();
- __set_task_comm(me, kbasename(bprm->filename), true);
+
+ /*
+ * If fdpath was set, alloc_bprm() made up a path that will
+ * probably not be useful to admins running ps or similar.
+ * Let's fix it up to be something reasonable.
+ */
+ if (bprm->fdpath) {
+ rcu_read_lock();
+ __set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
+ true);
+ rcu_read_unlock();
+ }
+ else {
+ __set_task_comm(me, kbasename(bprm->filename), true);
+ }
/* An exec changes our domain. We are no longer part of the thread
group */
I've minimally tested this.
-Kees
--
Kees Cook
Powered by blists - more mailing lists