lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202411211302.08EEE6D395@keescook>
Date: Thu, 21 Nov 2024 14:06:00 -0800
From: Kees Cook <kees@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Zbigniew Jędrzejewski-Szmek <zbyszek@...waw.pl>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	linux-kernel@...r.kernel.org,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Christophe JAILLET <christophe.jaillet@...adoo.fr>,
	Dan Carpenter <dan.carpenter@...aro.org>,
	Nir Lichtman <nir@...htman.org>,
	syzbot+03e1af5c332f7e0eb84b@...kaller.appspotmail.com,
	Tycho Andersen <tandersen@...flix.com>,
	Vegard Nossum <vegard.nossum@...cle.com>
Subject: Re: [GIT PULL] execve updates for v6.13-rc1

On Thu, Nov 21, 2024 at 11:23:46AM -0800, Linus Torvalds wrote:
> I'm done with this discussion that apparently was brought on by people
> not knowing what the hell they were doing.

This is disrespectful. If you're frustrated you can just say so. I'm
certainly frustrated.

> For user space, comm[] is basically the fallback for when cmdline
> fails for some reason (for example, /proc/*/cmdline will be empty for
> kworkers, but there are other situations too)
> 
> The reason? comm[] has *always* been much too limited for 'ps' output. ALWAYS.
> 
> Yes, you can literally *force* ps to not do that (eg "ps -eo comm")
> but if you do that, you get the very limited comm[] output that nobody
> has ever wanted ps to give exactly because it's so limited.

I think I finally figured out why you keep saying this. I think you mean
to imply "ps -e" (or similar), not "ps". Asking for more process details
("ps a", "ps -f", "ps -e", etc) uses cmdline. Without options that turn
on more details, the default is comm. If you run just "ps", it shows comm:

$ strace ps 2>&1 | grep /cmdline | wc -l
0

If you run with detail options it shows cmdline:

$ strace ps a 2>&1 | grep /cmdline | wc -l
1266
$ strace ps -f 2>&1 | grep /cmdline | wc -l
1266

This is procps-ng found on all Debian and Ubuntu systems. AFAICT
procps-ng is used on Fedora too.

Note I'm not saying comm is GOOD or anything. I'm just saying that it
IS regularly visible.

> And yes, 'top' will give comm[] output because it's so much faster.

Exactly. By default, top and ps both show comm, which in the vast
majority of cases is identical to argv[0]. I don't understand why this
is causing such angst -- it's just the observable facts: many things in
userspace expose comm and many also expose cmdline. Having them be
mismatched due to fexecve() is the whole issue.

Nobody blinks at:

    $ ps
        PID TTY          TIME CMD
    4125309 pts/1    00:00:47 bash
    4171960 pts/1    00:00:00 bash
    4171962 pts/1    00:00:00 vim
    4171997 pts/1    00:00:00 ps

vs

    $ ps -f
    UID          PID    PPID  C STIME TTY          TIME CMD
    kees     4125309    3947  0 Jul11 pts/1    00:00:47 -bash
    kees     4171960 4125309  0 13:30 pts/1    00:00:00 -bash
    kees     4171962 4171960  0 13:30 pts/1    00:00:00 vim
    kees     4172004 4125309  0 13:30 pts/1    00:00:00 ps -f

But if fexecve() were used now, "ps" would show:

    $ ps
        PID TTY          TIME CMD
    4125309 pts/1    00:00:47 3
    4171960 pts/1    00:00:00 3
    4171962 pts/1    00:00:00 3
    4171997 pts/1    00:00:00 3

This is obviously horrible.

Using f_path, we'd get close, but symlink destinations (or weird stuff
like "memfd:name-here") are shown:

    $ realpath $(which vim)
    /usr/bin/vim.basic

    $ ps
        PID TTY          TIME CMD
    4125309 pts/1    00:00:47 bash
    4171960 pts/1    00:00:00 bash
    4171962 pts/1    00:00:00 vim.basic
    4171997 pts/1    00:00:00 ps

Using argv[0], we'd get the original output:

    $ ps
        PID TTY          TIME CMD
    4125309 pts/1    00:00:47 bash
    4171960 pts/1    00:00:00 bash
    4171962 pts/1    00:00:00 vim
    4171997 pts/1    00:00:00 ps

IMO, switching to fexecve() shouldn't regress this basic piece of
information.

Now, I think we have three choices:

1) Leave it as-is. (comm is useless)

2) Use argv[0]. (comm matches what would show with execve() in most cases)

3) Use f_path (comm exposes f_path dentry name, which matches
   basename(readlink(/proc/*/exe)), but doesn't always match what execve()
   would show).

I think everyone agrees "1" should go away.

So it comes down to trying to stay looking more like execve()'s comm, or
more like /proc/*/exe's value.

Since comm is mutable anyway, I feel like the "friendlier" default for
userspace would be option 2.

If you still conclude differently, I guess the discussion is over, and
we go with 3:

diff --git a/fs/exec.c b/fs/exec.c
index e0435b31a811..8688bbbaf4af 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1347,7 +1347,21 @@ int begin_new_exec(struct linux_binprm * bprm)
 		set_dumpable(current->mm, SUID_DUMP_USER);
 
 	perf_event_exec();
-	__set_task_comm(me, kbasename(bprm->filename), true);
+
+	/*
+	 * If fdpath was set, alloc_bprm() made up a path that will
+	 * probably not be useful to admins running ps or similar.
+	 * Let's fix it up to be something reasonable.
+	 */
+	if (bprm->fdpath) {
+		rcu_read_lock();
+		__set_task_comm(me, smp_load_acquire(&bprm->file->f_path.dentry->d_name.name),
+				true);
+		rcu_read_unlock();
+	}
+	else {
+		__set_task_comm(me, kbasename(bprm->filename), true);
+	}
 
 	/* An exec changes our domain. We are no longer part of the thread
 	   group */


I've minimally tested this.

-Kees

-- 
Kees Cook

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ