lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 6 Dec 2011 14:33:13 -0800
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Cyrill Gorcunov <gorcunov@...nvz.org>
Cc:	Pavel Emelyanov <xemul@...allels.com>,
	Serge Hallyn <serge.hallyn@...onical.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Tejun Heo <tj@...nel.org>,
	Vasiliy Kulikov <segoon@...nwall.com>,
	Andrew Vagin <avagin@...nvz.org>,
	Oleg Nesterov <oleg@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fs, proc: Introduce the /proc/<pid>/children entry v2

On Tue, 6 Dec 2011 22:10:26 +0400
Cyrill Gorcunov <gorcunov@...nvz.org> wrote:

> There is no easy way to make a reverse parent->children chain
> from the task status, in turn children->parent provided with "PPid"
> field.
> 
> So instead of walking over all pids in system to figure out what
> children the task have -- we add explicit /proc/<pid>/children entry,
> since kernel already knows this kind of information but it was not
> yet exported.
> 
>
> ...
>
> +static int children_seq_show(struct seq_file *seq, void *v)
> +{
> +	struct task_struct *task = container_of(v, struct task_struct, sibling);
> +	return seq_printf(seq, " %d", pid_vnr(task_pid(task)));
> +}

Printing a pid_t with %d is a bit risky.  At present all architectures
appear to use int for pid_t so it's not broken (yet!).  I suppose we
can leave this code as-is for now - if some 32-bit arch turns up with a
64-bit pid_t, we'll want that warning to come out.

> +static void *children_seq_start(struct seq_file *seq, loff_t *pos)
> +{
> +	struct task_struct *task;
> +
> +	rcu_read_lock();
> +	task = seq->private;
> +	if (task)
> +		return seq_list_start(&task->children, *pos);
> +
> +	return NULL;
> +}
> +
> +static void *children_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> +{
> +	struct task_struct *task = seq->private;
> +	if (task)
> +		return seq_list_next(v, &task->children, pos);
> +	return NULL;
> +}
> +
> +static void children_seq_stop(struct seq_file *seq, void *v)
> +{
> +	rcu_read_unlock();
> +	seq_printf(seq, "\n");
> +}

So as I understand it, they way this works is that when the seqfile
buffer fills up we will drop the rcu lock, will flush the buffer and
then we reacquire the lock and the seq_file core will iterate over the
children list until we end up the same number of items from the head,
and populating the output buffer will resume.

This means that if the children list changes during the flush, the data
which userspace reads will be wrong.  Of course, inaccuracy is
unavoidable when returning kernel-internal data such as this to
userspace.

I suggest that the changelog should explain these races and should also
explain how you propose that the client software will be robust against
such races.

It's also worth adding a warning about this into the documentation for
this procfs file.  Documentation which this patch forgot to provide,
btw!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ