[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20181121141220.0e533c1dcb4792480efbf3ff@linux-foundation.org>
Date: Wed, 21 Nov 2018 14:12:20 -0800
From: Andrew Morton <akpm@...ux-foundation.org>
To: Daniel Colascione <dancol@...gle.com>
Cc: linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
timmurray@...gle.com, primiano@...gle.com, joelaf@...gle.com,
Jonathan Corbet <corbet@....net>,
Mike Rapoport <rppt@...ux.vnet.ibm.com>,
Vlastimil Babka <vbabka@...e.cz>, Roman Gushchin <guro@...com>,
Prashant Dhamdhere <pdhamdhe@...hat.com>,
"Dennis Zhou (Facebook)" <dennisszhou@...il.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
"Steven Rostedt (VMware)" <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
Dominik Brodowski <linux@...inikbrodowski.net>,
Josh Poimboeuf <jpoimboe@...hat.com>,
Ard Biesheuvel <ard.biesheuvel@...aro.org>,
Michal Hocko <mhocko@...e.com>,
Stephen Rothwell <sfr@...b.auug.org.au>,
KJ Tsanaktsidis <ktsanaktsidis@...desk.com>,
David Howells <dhowells@...hat.com>,
"open list:DOCUMENTATION" <linux-doc@...r.kernel.org>
Subject: Re: [PATCH v2] Add /proc/pid_gen
On Wed, 21 Nov 2018 12:54:20 -0800 Daniel Colascione <dancol@...gle.com> wrote:
> Trace analysis code needs a coherent picture of the set of processes
> and threads running on a system. While it's possible to enumerate all
> tasks via /proc, this enumeration is not atomic. If PID numbering
> rolls over during snapshot collection, the resulting snapshot of the
> process and thread state of the system may be incoherent, confusing
> trace analysis tools. The fundamental problem is that if a PID is
> reused during a userspace scan of /proc, it's impossible to tell, in
> post-processing, whether a fact that the userspace /proc scanner
> reports regarding a given PID refers to the old or new task named by
> that PID, as the scan of that PID may or may not have occurred before
> the PID reuse, and there's no way to "stamp" a fact read from the
> kernel with a trace timestamp.
>
> This change adds a per-pid-namespace 64-bit generation number,
> incremented on PID rollover, and exposes it via a new proc file
> /proc/pid_gen. By examining this file before and after /proc
> enumeration, user code can detect the potential reuse of a PID and
> restart the task enumeration process, repeating until it gets a
> coherent snapshot.
>
> PID rollover ought to be rare, so in practice, scan repetitions will
> be rare.
In general, tracing is a rather specialized thing. Why is this very
occasional confusion a sufficiently serious problem to warrant addition
of this code?
Which userspace tools will be using pid_gen? Are the developers of
those tools signed up to use pid_gen?
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -112,6 +112,7 @@ extern struct pid *find_ge_pid(int nr, struct pid_namespace *);
> int next_pidmap(struct pid_namespace *pid_ns, unsigned int last);
>
> extern struct pid *alloc_pid(struct pid_namespace *ns);
> +extern u64 read_pid_generation(struct pid_namespace *ns);
pig_generation_read() would be a better (and more idiomatic) name.
> extern void free_pid(struct pid *pid);
> extern void disable_pid_allocation(struct pid_namespace *ns);
>
> ...
>
> +u64 read_pid_generation(struct pid_namespace *ns)
> +{
> + u64 generation;
> +
> +
> + spin_lock_irq(&pidmap_lock);
> + generation = ns->generation;
> + spin_unlock_irq(&pidmap_lock);
> + return generation;
> +}
What is the spinlocking in here for? afaict the only purpose it serves
is to make the 64-bit read atomic, so it isn't needed on 32-bit?
> void disable_pid_allocation(struct pid_namespace *ns)
> {
> spin_lock_irq(&pidmap_lock);
> @@ -449,6 +463,17 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
> return idr_get_next(&ns->idr, &nr);
> }
>
> +#ifdef CONFIG_PROC_FS
> +static int pid_generation_show(struct seq_file *m, void *v)
> +{
> + u64 generation =
> + read_pid_generation(proc_pid_ns(file_inode(m->file)));
u64 generation;
generation = read_pid_generation(proc_pid_ns(file_inode(m->file)));
is a nicer way of avoiding column wrap.
> + seq_printf(m, "%llu\n", generation);
> + return 0;
> +
> +};
> +#endif
> +
> void __init pid_idr_init(void)
> {
> /* Verify no one has done anything silly: */
> @@ -465,4 +490,13 @@ void __init pid_idr_init(void)
>
> init_pid_ns.pid_cachep = KMEM_CACHE(pid,
> SLAB_HWCACHE_ALIGN | SLAB_PANIC | SLAB_ACCOUNT);
> +
> +}
> +
> +void __init pid_proc_init(void)
> +{
> + /* pid_idr_init is too early, so get a separate init function. */
s/get a/use a/
> +#ifdef CONFIG_PROC_FS
> + WARN_ON(!proc_create_single("pid_gen", 0, NULL, pid_generation_show));
> +#endif
> }
This whole function could vanish if !CONFIG_PROC_FS. Doesn't matter
much with __init code though.
Powered by blists - more mailing lists