linux-kernel - Re: Ensuring c/r maintainability (WAS Re: [RFC][PATCH 00/11] track files for checkpointability)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20090313175301.GA13050@us.ibm.com>
Date:	Fri, 13 Mar 2009 12:53:01 -0500
From:	"Serge E. Hallyn" <serue@...ibm.com>
To:	Matt Helsley <matthltc@...ibm.com>
Cc:	Serge Hallyn <serue@...ux.vnet.ibm.com>,
	Containers <containers@...ts.osdl.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Oren Laadan <orenl@...columbia.edu>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...e.hu>,
	Christoph Hellwig <hch@...radead.org>,
	Alexey Dobriyan <adobriyan@...il.com>
Subject: Re: Ensuring c/r maintainability (WAS Re: [RFC][PATCH 00/11] track
	files for checkpointability)

Quoting Matt Helsley (matthltc@...ibm.com):
> On Thu, Mar 12, 2009 at 10:30:48AM -0500, Serge E. Hallyn wrote:
> > Quoting Cedric Le Goater (legoater@...e.fr):
> > > >> And if Ingo's requirement is fulfilled, would any C/R patchset be acceptable ?
> > > > 
> > > > Yup, no matter how hideous  :)  Ok not really.
> > > > 
> > > > But the point was that it wasn't Dave not understanding Alexey's
> > > > suggestion, but Greg not understanding Ingo's.  If you think Ingo's
> > > > goal isn't worthwhile or achievable, then argue that (as I am), don't
> > > > keep elaborating on something we all agree will be needed (Alexey's
> > > > suggestion or some other way of doing a true may-be-checkpointed test).
> > > 
> > > I rather spend my time on enabling things rather than forbid them. 
> > 
> > That sure sounds productive.  How could I argue with that.
> > 
> > But wait, haven't several teams been doing that for years?  So why is
> > c/r not in the upstream kernel?  Could it be that ignoring the
> > upstream maintainers' concerns about (a) treating the feature as a
> > toy, (b) long-term maintainability, and (c) c/r becoming an impediment
> > to future features, and instead hacking away at our toy feature, is
> > *not* always the best course?
> 
> I've been thinking about how we could make checkpoint/restart (c/r) more
> maintainable in the long-term. I've only come up with two ideas:
> 
> I. Implement sparse-like __cr struct annotations for some compile-time checking.
> 
> First we annotate structures which c/r needs to save. For example we might have:
> 
> 	struct mm_struct {
> 		__cr struct vm_area_struct * mmap;
> 		struct rb_root mm_rb;
> 		struct vm_area_struct *mmap_cache;
> 		...
> 		__cr unsigned long mmap_base;
> 		__cr unsigned long task_size;
> 		..
> 	};
> 
> The __cr annotations indicate fields of the mm_struct which must be
> saved during checkpoint restart. In fact, for non-pointer fields these
> annotations would be sufficient to generate c/r code.
> 
> Next we would need a __cr_root annotation. These mark structures which
> the c/r code visits that determine the scope of c/r. If there is no path from a
> __cr annotation to a __cr_root annotation then we would conclude that c/r of
> this struct is broken. These path constraint checks could be done at compile
> time.

Hi Matt,

is what you're detecting here really something we're worried about?

Maybe that's something we should be doing - coming up with a list of
the things we are trying to detect or prevent.  I can only think of
a few offhand:

1. checkpoint (and restart) a task which has used a resource which we
do cannot (yet, or ever) safely checkpoint/restart.

2. kernel has a new feature for which we have not considered
checkpoint/restart.  Not only is it not safe to c/r a task using it,
but we haven't even implemented a check for tasks using it.

3. Some new kernel feature has an attribute which simply must be
stored away.  An example would be the vdso_base in s390 as of
recent 2.6.29 rc's, which was not present in 2.6.28.  So there are
two things to worry about in this one:

	a. detect that this happened and handle it, so c/r continues
	   to work.
	b. figure out a way to restart an older c/r image on a newer
	   kernel - or simply detect older images and call them
	   incompatible.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/