[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141125165726.GA28913@redhat.com>
Date: Tue, 25 Nov 2014 17:57:26 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Aaron Tomlin <atomlin@...hat.com>,
Pavel Emelyanov <xemul@...allels.com>,
Serge Hallyn <serge.hallyn@...ntu.com>,
Sterling Alexander <stalexan@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] exit/pid_ns: comments + simple fix
On 11/24, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@...hat.com> writes:
>
> > Eric, Pavel, could you review 1/2 ? (documentation only). It is based on the
> > code inspection, I didn't bother to verify that my understanding matches the
> > reality ;)
> >
> > On 11/20, Oleg Nesterov wrote:
> >>
> >>
> >> Probably this is not the last series... in particular it seems that we
> >> have some problems with sys_setns() in this area, but I need to recheck.
> >
> > So far only the documentation fix. I'll write another email (hopefully with the
> > patch), afaics at least setns() doesn't play well with PR_SET_CHILD_SUBREAPER.
> >
> > Contrary to what I thought zap_pid_ns_processes() looks fine, but it seems only
> > by accident. Unless I am totally confused, wait for "nr_hashed == init_pids"
> > could be removed after 0a01f2cc390e10633a "pidns: Make the pidns proc mount/
> > umount logic obvious". However, now that setns() + fork() can inject a task
> > into a child namespace, we need this code again for another reason.
> >
> > I _think_ we can actually remove it and simplify free_pid() as well, but lets
> > discuss this later and fix the wrong/confusing documentation first.
>
> At the very least there is the issue of rusage being wrong if we allow
> the init process to be reaped before all of it's children are reaped.
Do you mean cstime/cutime/c* accounting?
Firstly it is not clear what makes child_reaper special in _this_ sense, but
this doesn't matter at all.
The auotoreaping/EXIT_DEAD children are not accounted, only wait_task_zombie()
accumulates these counters. (just in case, accounting in __exit_signal() is
another thing).
> There is also a huge level of weird non-intuitive behavior that would
> require some substantial benefits to justify an optimization of letting
> a child exist longer than init.
Sure. That is why I said "lets discuss this later". This patch doesn't try
to change the rules. It only tries to document the current code.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists