linux-kernel - Re: Testing lxc 0.6.5 in Fedora 13

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m1vdc48vhy.fsf@fess.ebiederm.org>
Date:	Tue, 06 Apr 2010 08:13:13 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Matt Helsley <matthltc@...ibm.com>
Cc:	Roland McGrath <roland@...hat.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Grzegorz Nosek <root@...aldomain.pl>,
	Sukadev Bhattiprolu <sukadev@...ibm.com>,
	containers@...ts.linux-foundation.org, linux-kernel@...r.kernel.org
Subject: Re: Testing lxc 0.6.5 in Fedora 13

Matt Helsley <matthltc@...ibm.com> writes:

> On Mon, Apr 05, 2010 at 08:44:43PM -0700, Roland McGrath wrote:
>> (I've been away for a couple of weeks.)
>> I concur with the things Oleg's said in this thread.
>> 
>> As to what's "correct" for the kernel in theory, it would certainly make
>> sense to clean up the ptrace cases to use the tracer (parent) pid_ns when
>> reporting any PID as such.  The wait and SIGCHLD code already does this, so
>> that would be consistent.  Off hand I don't see anything other than
>> tracehook_report_clone{,_complete}() that sees the wrong value now.
>
> Yup.
>
>> Fixing that requires a bit of hair.  The simple and clean approach is to
>> just have the tracehook calls (i.e. ptrace layer) extract the PID from the
>> task_struct using the desired pid_ns.  The trouble there is that the
>
> It's also possible to take an extra reference to the struct pid and pass
> that to the tracehook. That and the pid_ns of the tracer receiving the pid
> is all we'll ever need inside the tracehook layer. The only advantage, I
> think, is we wouldn't pin the task struct while holding the pid reference.
>
>> tracehook_report_clone_complete() call is made when that task_struct is no
>> longer guaranteed to be valid.  The contrary approach of extracting the
>> appropriate value for the tracer earlier breaks the clean layering because
>> the fork.c code really should not know at all that ->parent->nsproxy is the
>> place to look for what values to pass to tracehook calls.  I guess the
>> simple and clean fix is to get_task_struct() before wake_up_new_task()
>> and put_task_struct() after tracehook_report_clone_complete().  That does
>> add some gratuitous atomic incr/decr overhead, though.
>
> Also true.
>
> Of course my suggestion of holding the pid reference won't avoid adding
> atomic ops -- just changes which refcount they work on.
>
>> 
>> None of this has much of anything to do with strace, of course.  As I've
>> said, I don't see anything other than the PTRACE_GETEVENTMSG value for
>> PTRACE_EVENT_{CLONE,FORK,VFORK} reports that is wrong in the kernel.  As
>> Oleg said, strace doesn't use that at all.  (This is not the place to
>> discuss the details of strace further.)
>
> Also, looking at proposed changes (utrace and Eric Biederman's setns())
> storing a pid nr rather than a reference to a task struct or struct pid
> probably won't be correct.

My setns work has demonstrated that even for entering a namespace we
never ever need to change the struct pid of a task.  setns has no other
bearing on this problem then to say there is no foreseeable reason to
change the rules.

> In the case of Eric Biederman's setns(), if capable of changing pid namespace,
> we could have:
>
> 	Traced				Tracer
> 	fork()
> 					... (an arbitrary amount of time passes)
> 					setns()
> 					ptrace(GETEVENTMSG)

Forget that.  The pid namespace was architected so that we can ptrace a process
in another pid namespace.

> At which point returning a static pid number held in the message field
> produces the wrong pid.

No.  A processes always sees pids from the context of it's original pid
namespace.  All setns does is affect which pid namespace children will
be native in.


Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/