lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20120120150108.GA9081@redhat.com>
Date:	Fri, 20 Jan 2012 16:01:08 +0100
From:	Oleg Nesterov <oleg@...hat.com>
To:	Cyrill Gorcunov <gorcunov@...il.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@...il.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Pavel Emelyanov <xemul@...allels.com>,
	Serge Hallyn <serge.hallyn@...onical.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Tejun Heo <tj@...nel.org>, Andrew Vagin <avagin@...nvz.org>,
	Vasiliy Kulikov <segoon@...nwall.com>
Subject: Re: [PATCH] fs, proc: Introduce /proc/<pid>/task/<tid>/children
	entry v7

On 01/19, Cyrill Gorcunov wrote:
>
> Here is refreshed one for sure.


Reviewed-by: Oleg Nesterov <oleg@...hat.com>


> From: Cyrill Gorcunov <gorcunov@...nvz.org>
> Subject: fs, proc: Introduce /proc/<pid>/task/<tid>/children entry v8
> 
> When we do checkpoint of a task we need to know the list of children
> the task, has but there is no easy and fast way to generate reverse
> parent->children chain from arbitrary <pid> (while a parent pid is
> provided in "PPid" field of /proc/<pid>/status).
> 
> So instead of walking over all pids in the system (creating one big process
> tree in memory, just to figure out which children a task has) -- we add
> explicit /proc/<pid>/task/<tid>/children entry, because the kernel already has
> this kind of information but it is not yet exported.
> 
> This is a first level children, not the whole process tree.
> 
> v2:
>  - Kame suggested to use a separated /proc/<pid>/children entry
>    instead of poking /proc/<pid>/status
>  - Andew suggested to use rcu facility instead of locking
>    tasklist_lock
>  - Tejun pointed that non-seekable seq file might not be
>    enough for tasks with large number of children
> 
> v3:
>  - To be on a safe side use %lu format for pid_t printing
> 
> v4:
>  - New line get printed when sequence ends not at seq->stop,
>    a nit pointed by Tejun
>  - Documentation update
>  - tasklist_lock is back, Oleg pointed that ->children list
>    is actually not rcu-safe
> 
> v5:
>  - Oleg suggested to make /proc/<pid>/task/<tid>/children
>    instead of global /proc/<pid>/children, which eliminates
>    hardness related to threads and children migration, and
>    allows patch to be a way simplier.
> 
> v6:
>  - Drop ptrace_may_access tests, pids are can be found anyway
>    so nothing to protect here.
>  - Update comments and docs, pointed by Oleg.
> 
> v7:
>  - Use get_pid over proc-pid directly, to simplify
>    code, pointed by Oleg.
> 
> v8:
>  - Obtain a starting pid from the proc's inode directly.
> 
> Signed-off-by: Cyrill Gorcunov <gorcunov@...nvz.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Pavel Emelyanov <xemul@...allels.com>
> Cc: Serge Hallyn <serge.hallyn@...onical.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
> Cc: Oleg Nesterov <oleg@...hat.com>
> ---
>  Documentation/filesystems/proc.txt |   18 +++++
>  fs/proc/array.c                    |  123 +++++++++++++++++++++++++++++++++++++
>  fs/proc/base.c                     |    1 
>  fs/proc/internal.h                 |    1 
>  4 files changed, 143 insertions(+)
> 
> Index: linux-2.6.git/Documentation/filesystems/proc.txt
> ===================================================================
> --- linux-2.6.git.orig/Documentation/filesystems/proc.txt
> +++ linux-2.6.git/Documentation/filesystems/proc.txt
> @@ -40,6 +40,7 @@ Table of Contents
>    3.4	/proc/<pid>/coredump_filter - Core dump filtering settings
>    3.5	/proc/<pid>/mountinfo - Information about mounts
>    3.6	/proc/<pid>/comm  & /proc/<pid>/task/<tid>/comm
> +  3.7   /proc/<pid>/task/<tid>/children - Information about task children
>  
>    4	Configuring procfs
>    4.1	Mount options
> @@ -1549,6 +1550,23 @@ then the kernel's TASK_COMM_LEN (current
>  comm value.
>  
>  
> +3.7	/proc/<pid>/task/<tid>/children - Information about task children
> +-------------------------------------------------------------------------
> +This file provides a fast way to retrieve first level children pids
> +of a task pointed by <pid>/<tid> pair. The format is a space separated
> +stream of pids.
> +
> +Note the "first level" here -- if a child has own children they will
> +not be listed here, one needs to read /proc/<children-pid>/task/<tid>/children
> +to obtain the descendants.
> +
> +Since this interface is intended to be fast and cheap it doesn't
> +guarantee to provide precise results and some children might be
> +skipped, especially if they've exited right after we printed their
> +pids, so one need to either stop or freeze processes being inspected
> +if precise results are needed.
> +
> +
>  ------------------------------------------------------------------------------
>  Configuring procfs
>  ------------------------------------------------------------------------------
> Index: linux-2.6.git/fs/proc/array.c
> ===================================================================
> --- linux-2.6.git.orig/fs/proc/array.c
> +++ linux-2.6.git/fs/proc/array.c
> @@ -547,3 +547,126 @@ int proc_pid_statm(struct seq_file *m, s
>  
>  	return 0;
>  }
> +
> +static struct pid *
> +get_children_pid(struct inode *inode, struct pid *pid_prev, loff_t pos)
> +{
> +	struct task_struct *start, *task;
> +	struct pid *pid = NULL;
> +
> +	read_lock(&tasklist_lock);
> +
> +	start = pid_task(proc_pid(inode), PIDTYPE_PID);
> +	if (!start)
> +		goto out;
> +
> +	/*
> +	 * Lets try to continue searching first, this gives
> +	 * us significant speedup on children-rich processes.
> +	 */
> +	if (pid_prev) {
> +		task = pid_task(pid_prev, PIDTYPE_PID);
> +		if (task && task->real_parent == start &&
> +		    !(list_empty(&task->sibling))) {
> +			if (list_is_last(&task->sibling, &start->children))
> +				goto out;
> +			task = list_first_entry(&task->sibling,
> +						struct task_struct, sibling);
> +			pid = get_pid(task_pid(task));
> +			goto out;
> +		}
> +	}
> +
> +	/*
> +	 * Slow search case.
> +	 *
> +	 * We might miss some children here if children
> +	 * are exited while we were not holding the lock,
> +	 * but it was never promised to be accurate that
> +	 * much.
> +	 *
> +	 * "Just suppose that the parent sleeps, but N children
> +	 *  exit after we printed their tids. Now the slow paths
> +	 *  skips N extra children, we miss N tasks." (c)
> +	 *
> +	 * So one need to stop or freeze the leader and all
> +	 * its children to get a precise result.
> +	 */
> +	list_for_each_entry(task, &start->children, sibling) {
> +		if (pos-- == 0) {
> +			pid = get_pid(task_pid(task));
> +			break;
> +		}
> +	}
> +
> +out:
> +	read_unlock(&tasklist_lock);
> +	return pid;
> +}
> +
> +static int children_seq_show(struct seq_file *seq, void *v)
> +{
> +	struct inode *inode = seq->private;
> +	pid_t pid;
> +
> +	pid = pid_nr_ns(v, inode->i_sb->s_fs_info);
> +	return seq_printf(seq, " %d", pid);
> +}
> +
> +static void *children_seq_start(struct seq_file *seq, loff_t *pos)
> +{
> +	return get_children_pid(seq->private, NULL, *pos);
> +}
> +
> +static void *children_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> +{
> +	struct pid *pid = NULL;
> +
> +	pid = get_children_pid(seq->private, v, *pos + 1);
> +	if (!pid)
> +		seq_printf(seq, "\n");
> +	put_pid(v);
> +
> +	++*pos;
> +	return pid;
> +}
> +
> +static void children_seq_stop(struct seq_file *seq, void *v)
> +{
> +	put_pid(v);
> +}
> +
> +static const struct seq_operations children_seq_ops = {
> +	.start	= children_seq_start,
> +	.next	= children_seq_next,
> +	.stop	= children_seq_stop,
> +	.show	= children_seq_show,
> +};
> +
> +static int children_seq_open(struct inode *inode, struct file *file)
> +{
> +	struct seq_file *m;
> +	int ret;
> +
> +	ret = seq_open(file, &children_seq_ops);
> +	if (ret)
> +		return ret;
> +
> +	m = file->private_data;
> +	m->private = inode;
> +
> +	return ret;
> +}
> +
> +int children_seq_release(struct inode *inode, struct file *file)
> +{
> +	seq_release(inode, file);
> +	return 0;
> +}
> +
> +const struct file_operations proc_tid_children_operations = {
> +	.open    = children_seq_open,
> +	.read    = seq_read,
> +	.llseek  = seq_lseek,
> +	.release = children_seq_release,
> +};
> Index: linux-2.6.git/fs/proc/base.c
> ===================================================================
> --- linux-2.6.git.orig/fs/proc/base.c
> +++ linux-2.6.git/fs/proc/base.c
> @@ -3454,6 +3454,7 @@ static const struct pid_entry tid_base_s
>  	ONE("stat",      S_IRUGO, proc_tid_stat),
>  	ONE("statm",     S_IRUGO, proc_pid_statm),
>  	REG("maps",      S_IRUGO, proc_maps_operations),
> +	REG("children",  S_IRUGO, proc_tid_children_operations),
>  #ifdef CONFIG_NUMA
>  	REG("numa_maps", S_IRUGO, proc_numa_maps_operations),
>  #endif
> Index: linux-2.6.git/fs/proc/internal.h
> ===================================================================
> --- linux-2.6.git.orig/fs/proc/internal.h
> +++ linux-2.6.git/fs/proc/internal.h
> @@ -53,6 +53,7 @@ extern int proc_pid_statm(struct seq_fil
>  				struct pid *pid, struct task_struct *task);
>  extern loff_t mem_lseek(struct file *file, loff_t offset, int orig);
>  
> +extern const struct file_operations proc_tid_children_operations;
>  extern const struct file_operations proc_maps_operations;
>  extern const struct file_operations proc_numa_maps_operations;
>  extern const struct file_operations proc_smaps_operations;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ