linux-kernel - Re: [PATCH v9] pidfd: add ioctl to retrieve pid info

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241008-kruste-aufguss-bd03e60997ab@brauner>
Date: Tue, 8 Oct 2024 15:23:06 +0200
From: Christian Brauner <brauner@...nel.org>
To: luca.boccassi@...il.com, oleg@...hat.com
Cc: linux-fsdevel@...r.kernel.org, christian@...uner.io, 
	linux-kernel@...r.kernel.org, oleg@...hat.com
Subject: Re: [PATCH v9] pidfd: add ioctl to retrieve pid info

> > +#ifdef CONFIG_CGROUPS
> > +	if (request_mask & PIDFD_INFO_CGROUPID) {
> > +		struct cgroup *cgrp;
> > +
> > +		guard(rcu)();
> > +		cgrp = task_cgroup(task, pids_cgrp_id);
> > +		if (!cgrp)
> > +			return -ENODEV;
> 
> Afaict this means that the task has already exited. In other words, the
> cgroup id cannot be retrieved anymore for a task that has exited but not
> been reaped. Frankly, I would have expected the cgroup id to be
> retrievable until the task has been reaped but that's another
> discussion.
> 
> My point is if you contrast this with the other information in here: If
> the task has exited but hasn't been reaped then you can still get
> credentials such as *uid/*gid, and pid namespace relative information
> such as pid/tgid/ppid.

Related to this and I just want to dump this idea somewhere:

I'm aware that it is often desirable or useful to have information about
a task around even after the task has exited and been reaped.

The exit status comes to mind but maybe there's other stuff that would
be useful to have.

Since we changed to pidfs we know that all pidfds no matter if they
point to the same struct file (e.g., dup()) or to multiple struct files
(e.g., multiple pidfd_open() on the same pid) all point to the same
dentry and inode. Which is why we switched to stashing struct pid in
inode->i_private.

So we could easily do something like this:

// SKETCH SKETCH SKETCH
diff --git a/fs/pidfs.c b/fs/pidfs.c
index 7ffdc88dfb52..eeeb907f4889 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -344,9 +344,24 @@ static const struct dentry_operations pidfs_dentry_operations = {
        .d_prune        = stashed_dentry_prune,
 };

+struct pidfd_kinfo {
+       // We could even move this back to file->private_data to avoid the
+       // additional pointer deref though I doubt it matters.
+       struct pid *pid;
+       int exit_status;
+       // other stuff;
+};
+
 static int pidfs_init_inode(struct inode *inode, void *data)
 {
-       inode->i_private = data;
+       struct pidfd_kinfo *kinfo;
+
+       kinfo = kzalloc(sizeof(*info), GFP_KERNEL_ACCOUNT);
+       if (!kinfo)
+               return -ENOMEM;
+
+       kinfo->pid = data;
+       inode->i_private = kinfo;
        inode->i_flags |= S_PRIVATE;
        inode->i_mode |= S_IRWXU;
        inode->i_op = &pidfs_inode_operations;

and that enables us to persist information past task exit so that as
long as you hold the pidfd you can e.g., query for the exit state of the
task or something.

I'm mostly thinking out loud but I think that could be potentially
interesting.