lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090414190032.GA28267@x200.localdomain>
Date:	Tue, 14 Apr 2009 23:00:32 +0400
From:	Alexey Dobriyan <adobriyan@...il.com>
To:	Oren Laadan <orenl@...columbia.edu>
Cc:	akpm@...ux-foundation.org, containers@...ts.linux-foundation.org,
	xemul@...allels.com, serue@...ibm.com, dave@...ux.vnet.ibm.com,
	mingo@...e.hu, hch@...radead.org, torvalds@...ux-foundation.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 10/30] cr: core stuff

> >> The ability to streamline the checkpoint image IMHO is invaluable.
> >> It's the unix way (TM) of doing things; it makes the process pipe-able.
> >>
> >> You can do many nice things when the checkpoint can be streamed: you
> >> can compress, sign, encrypt etc on the fly without taking additional
> >> diskspace. You can transfer over the network (e.g. for migration),
> >> or store remotely without explicit file system support. You can easily
> >> transform the stream from one c/r version to another etc.
> >>
> >> This should be a design principle. In my experience I never hit a wall
> >> that forced me to "sacrifice" this decision.
> >>
> >>>   sacrifised (read: child can ptrace parent)
> >> Hmmm... if all tasks are created in user space, then this specific
> >> becomes a no-brainer !
> > 
> > No!
> 
> Actually yes :)
> 
> > 
> > A ptraces B. Container is checkpointed.
> > 
> > Kernel realizes ptrace is going on. A and B in theory can have any
> > realitionship.
> > 
> > Consequently, kernel doesn't know in which order to dump A and B.
> > 
> > And there is no such order:
> > *) A can be parent of B (you dump A, B),
> > *) A can be child of B (you want to dump B, A, but this conflicts with
> >    ->real_parent order)
> > *) A and B just tasks (any order).
> 
> Current code does not support ptrace() - which has a multitude
> if tidy-bits issues to solve during restart regardless.
> 
> However, creating tasks in userspace uses (and will uses) only
> "real" process relationships, not ptrace-relationships, when it
> comes to decide on the fork/clone order.
> 
> Technically, that can be done in checkpoint (dumping the task tree)
> or in restart-user-space (rearranging the data before fork/clone).
> 
> > 
> > I'm showing that whole issue can be avoided:
> 
> If the issue can be avoided, then why would you need to sacrifice
> the stream-ability of the checkpoint image ?
> 
> > *) all tasks are simply created regardless of who is parent of whom
> >    (see kernel_thread())
> > *) Every task_struct image among other things contains references to
> >    ->real_parent and ->parent.
> > *) After every task is created it's time to change references:
> > 	**) lookup who is ->real_parent, change ->real_parent _by hand_
> > 		not with some "correct clone(2)" order.
> > 	**) lookup who is ->parent, change ->parent.
> > 
> > You're probably escaping all of this with object numbers?
> 
> (Will be) escaping this by arranging to fork/clone in the proper order.

task_struct and reparenting is just an example.

There is another loop:

	struct user_struct => struct user_namespace => struct user_namespace::creator

Before actual dump each struct user_struct gets unique id (objref, whatever)
and simply dumped regardless of order.

Image of struct user_namespace contains id of creator user and dumped.

On restart:
	restart user_ns
	restart user
	lookup object by creator id
	if found, rewrite ->creator
	if not found, restore creator user, and rewrite ->creator.

So, yes, if object number is dumped on disk, you get streamability in
presence of loops.

Clever. Just needs a way to quickly lookup file position by object id.

BTW, this is why OpenVZ code have "section concept.
I hoped it won't be needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ