lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z0A3EkxZZg19Dp9Q@kawka3.in.waw.pl>
Date: Fri, 22 Nov 2024 07:47:30 +0000
From: Zbigniew Jędrzejewski-Szmek <zbyszek@...waw.pl>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
	Kees Cook <kees@...nel.org>, linux-kernel@...r.kernel.org,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Christophe JAILLET <christophe.jaillet@...adoo.fr>,
	Dan Carpenter <dan.carpenter@...aro.org>,
	Nir Lichtman <nir@...htman.org>,
	syzbot+03e1af5c332f7e0eb84b@...kaller.appspotmail.com,
	Tycho Andersen <tandersen@...flix.com>,
	Vegard Nossum <vegard.nossum@...cle.com>
Subject: Re: [GIT PULL] execve updates for v6.13-rc1

On Thu, Nov 21, 2024 at 10:59:59PM -0600, Eric W. Biederman wrote:
> Zbigniew Jędrzejewski-Szmek <zbyszek@...waw.pl> writes:
> 
> > On Wed, Nov 20, 2024 at 06:23:55PM -0800, Linus Torvalds wrote:
> >> On Wed, 20 Nov 2024 at 16:55, Eric W. Biederman <ebiederm@...ssion.com> wrote:
> >> >
> >> > __set_task_comm cannot be called with bprm->file->f_dentry
> >> > unconditionally.
> >> 
> >> No, no. Only for the "no path" case.
> >> 
> >> > The reason bprm->file->f_dentry.dentry was abandoned were concerns
> >> > about breaking userspace.
> >> 
> >> There's no way it can break user space considering that right now
> >> comm[] ends up being just garbage.
> >
> > It'll "break userspace" in the sense the the resulting program name
> > visible in /proc/self/{comm,stat,status} would be different than the
> > expected value.
> 
> This actually has not been shown.
> 
> In the general case of programs on linux it is indeed the case
> we have multi-call binaries and symlinks to binaries.
> 
> Limiting things to just fexecve reduces the scope,
> I didn't think about the scope reduction when you made this argument the
> first time.
> 
> I do not see any evidence that there are daemons started by systemd
> where systemd follows the name in /etc/alternatives on debian, or winds
> up following a symlink on a multicall binary.  The way I understand
> /etc/alternatives I don't think you would ever want to use it for the
> name of a daemon that is put in a unit file.

systemd-udevd is one example that ~everybody has installed.
(Though it doesn't use comm, it uses argv[0] to decide behaviour.)

We can certainly find more, on my Fedora system, 337/2252 files
in /usr/sbin/ are symlinks, quite a few candidates.
But even if we went through every one of those, it's not enough,
because people have custom unit files and there's also systemd-run
and run0.

> All of these cases where you get a task->comm that would be different
> with fexecve are rare oddball cases today.  I think it is quite likely
> nothing systemd wants to start with fexecve will have this problem.
> 
> Further you can detect this problem before you get as far as starting
> the application.  Just pass O_NOFOLLOW to open and you can refuse to
> follow the potentially confusing symlinks.

We'd have two choices: refuse the command, which is not really a great
option, or fall back to execve(), an approach which I find really
unappealing.

> So short of finding real programs in the real world that actually care
> it seems perfectly reasonable to do the cheap and easy thing.
>
> Right now it feels like you are adopting a very cautious approach and
> arguing for an expensive implementation simply because it is a lot of
> work to figure out what programs you actually care about and see what
> those programs are actually doing.

So it's not really "programs" per se. For example, systemd itself
doesn't care at all, because it uses cgroups to manage processes. I've
been using systemd-with-fexecve for two years and apart from strange 'ps'
listings, it's all fine. But we have administrators and scripts.
For example, 'pgrep systemd-udevd' and 'pkill -HUP httpd' and a
bazillion other calls that we cannot ever find or change.

> On the system I am writing this on right now there are about 300
> processes running and only about 17 whose parent is pid 1.  Most of
> those process whose parent is pid 1 run as root.  Which means extra care
> needs to be taken with them anyway.

This would also affect user process started by 'systemd --user',
i.e. most of the graphical user session nowadays (under GNOME, KDE,
etc.)

> > Even if we end up copying a string from userspace unnecessarilly,
> > does this matter? execve is a heavyweight operation and copying a
> > a dozen bytes extra hardly matters.
> 
> Generally it is on the person arguing for making the kernel more
> expensive to find a compelling case.  In this instance my imagination
> fails me in finding a binary that systemd will start that will be
> affected.  So I simply don't see the point in making the code
> more expensive.
> 
> I was really hoping we could use the cheap bprm->file->f_dentry
> to set task->comm in all cases as that would make the kernel
> simpler and faster.
> 
> To me using bprm->file->f_dentry still seems to make more sense than a
> field whose value (in the case of login shells) is documented as being
> different from what you are looking for.
> 
> There is no solution that doesn't have down sides.  As such the kernel
> might as well use a solution that is cheap and as close to how
> task->comm has been calculated historically as possible.

Right. It comes down to judgement / guesswork about the downsides of
each approach.

As I wrote before, the approach with bprm->file->f_dentry is _much_
better than what we have now. So I'll take that over status quo.
But from the POV of a distro maintainer, it's not ideal. Basically,
with the patch that uses argv[0], I expect almost no user-visible
change, so I can rebuild systemd with -Dfexecve=true in Fedora and
maybe send a mail to fedora-devel with "hey, watch out, the way
systemd starts processes has been changed, selinux could be affected,
please report any errors".

With the f_dentry version, we'll still do the transition, but we'll
need to make an effort to warn users and do it much more visibly, and
based on past experience with such things, there'll still inevitably
be users who report that their favourite backup script that they call
from the git checkout now doesn't terminate properly and stuff like
that. Every kernel maintainer knows that any aspect of behaviour that
is visibile to users starts being relied on…

I think fexecve is a nice feature (as all the fd-based syscalls
in general), but it need to be drop-in replacement if it's supposed
to be adopted quickly.

Zbyszek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ