linux-kernel - Re: [PATCH 0/2] Kernel Live Patching

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141108034553.GA11663@treble.redhat.com>
Date:	Fri, 7 Nov 2014 21:45:53 -0600
From:	Josh Poimboeuf <jpoimboe@...hat.com>
To:	Vojtech Pavlik <vojtech@...e.cz>
Cc:	Christoph Hellwig <hch@...radead.org>,
	Seth Jennings <sjenning@...hat.com>,
	Jiri Kosina <jkosina@...e.cz>,
	Steven Rostedt <rostedt@...dmis.org>,
	live-patching@...r.kernel.org, kpatch@...hat.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] Kernel Live Patching

On Fri, Nov 07, 2014 at 10:27:35PM +0100, Vojtech Pavlik wrote:
> On Fri, Nov 07, 2014 at 09:45:00AM -0600, Josh Poimboeuf wrote:
> 
> > > 	LEAVE_FUNCTION
> > > 	LEAVE_PATCHED_SET
> > > 	LEAVE_KERNEL
> > > 
> > > 	SWITCH_FUNCTION
> > > 	SWITCH_THREAD
> > > 	SWITCH_KERNEL
> > > 
> > > Now with those definitions:
> > > 
> > > 	livepatch (null model), as is, is LEAVE_FUNCTION and SWITCH_FUNCTION
> > > 
> > > 	kpatch, masami-refcounting and Ksplice are LEAVE_PATCHED_SET and SWITCH_KERNEL
> > > 
> > > 	kGraft is LEAVE_KERNEL and SWITCH_THREAD
> > > 
> > > 	CRIU/kexec is LEAVE_KERNEL and SWITCH_KERNEL
> > 
> > Thanks, nice analysis!
> > 
> > > By blending kGraft and masami-refcounting, we could create a consistency
> > > engine capable of almost any combination of these properties and thus
> > > all the consistency models.
> > 
> > Can you elaborate on what this would look like?
> 
> There would be the refcounting engine, counting entries/exits of the
> area of interest (nothing for LEAVE_FUNCTION, patched functions for
> LEAVE_PATCHED_SET - same as Masami's work now, or syscall entry/exit for
> LEAVE_KERNEL), and it'd do the counting either per thread, flagging a
> thread as 'new universe' when the count goes to zero, or flipping a
> 'new universe' switch for the whole kernel when the count goes down to zero.
> 
> A patch would have flags which specify a combination of the above
> properties that are needed for successful patching of that specific
> patch.

Would it really be necessary to support all possible permutations?  I
think that would add a lot of complexity to the code.  Especially if we
have to support LEAVE_KERNEL, which adds a lot of interdependencies with
the rest of the kernel (kthreads, syscall, irq, etc).

It seems to me that the two most interesting options are:
- LEAVE_PATCHED_SET + SWITCH_THREAD (Masami-kGraft)
- LEAVE_PATCHED_SET + SWITCH_KERNEL (kpatch and/or Masami-kpatch)

I do see some appeal for the choose-your-own-consistency-model idea.
But it wouldn't be practical for cumulative patch upgrades, which I
think we both agreed at LPC seems to be safer than incremental patching.
If you only ever have one patch module loaded at any given time, you
only get one consistency model anyway.

In order for multiple consistency models to be practical, I think we'd
need to figure out how to make incremental patching safe.

> > The big problem with SWITCH_THREAD is that it adds the possibility that
> > old functions can run simultaneously with new ones.  When you change
> > data or data semantics, which is roughly 10% of security patches, it
> > creates some serious headaches:
> > 
> > - It makes patch safety analysis much harder by doubling the number of
> >   permutations of scenarios you have to consider.  In addition to
> >   considering newfunc/olddata and newfunc/newdata, you also have to
> >   consider oldfunc/olddata and oldfunc/newdata.
> > 
> > - It requires two patches instead of one.  The first patch is needed to
> >   modify the old functions to be able to deal with new data.  After the
> >   first patch has been fully applied, then you apply the second patch
> >   which can start creating new versions of data.
> 
> For data layout an semantic changes, there are two approaches:
> 
> 	1) TRANSFORM_WORLD
> 
> 	Stop the world, transform everything, resume. This is what Ksplice does
> 	and what could work for kpatch, would be rather interesting (but
> 	possible) for masami-refcounting and doesn't work at all for the
> 	per-thread kGraft.
> 
> 	It allows to deallocate structures, allocate new ones, basically
> 	rebuild the data structures of the kernel. No shadowing or using
> 	of padding is needed.
> 
> 	The nice part is that the patch can stay pretty much the original patch
> 	that fixes the bug when applied to normal kernel sources.
> 
> 	The most tricky part with this approach is writing the
> 	additional transformation code. Finding all instances of a
> 	changed data structure. It fails if only semantics are changed,
> 	but that is easily fixed by making sure there is always a layout
> 	change for any semantic change. All instances of a specific data
> 	structure can be found, worst case with some compiler help: No
> 	function can have pointers or instances of the structure on the
> 	stack, or registers, as that would include it in the patched
> 	set. So all have to be either global, or referenced by a
> 	globally-rooted tree, linked list or any other structure.
> 
> 	This one is also possible to revert, if a reverse-transforming function
> 	is provided.
> 
> 	masami-refcounting can be made to work with this by spinning in every
> 	function entry ftrace/kprobe callback after a universe flip and calling
> 	stop_kernel from the function exit callback that flipped the switch.

I'm kind of surprised to hear that Ksplice does this.  I had considered
this approach, but it sounds really tricky, if not impossible in many
cases.

Ahem, this would be an opportune time for a Ksplice person to chime in
with their experiences...

> 	2) TRANSFORM_ON_ACCESS
> 
> 	This requires structure versioning and/or shadowing. All 'new' functions
> 	are written with this in mind and can both handle the old and new data formats
> 	and transform the data to the new format. When universe transition is
> 	completed for the whole system, a single flag is flipped for the
> 	functions to start transforming.

This is a variation on what we've been doing with kpatch, using shadow
data fields to add data and/or version metadata to structures, with a
few differences:

First, we haven't been transforming existing data.  All existing data
structures stay at v1.  Only new data structures are created as v2.  I
suppose it depends on the nature of the patch as to whether it's safe to
convert existing data.

Second, we don't need to flip a flag, because with SWITCH_KERNEL the
system universe transition happens instantly.

> 	The advantage is to not have to look up every single instance of the
> 	structure and not having to make sure you found them all.
>
> 	The disadvantages are that the patch now looks very different to what
> 	goes into the kernel sources,

In my experience, s/very different/slightly different/.

>	that you never know whether the conversion
> 	is complete and reverting the patch is tough, although can be helped by
> 	keeping track of transformed functions at a cost of maintaining another
> 	data structure for that.

True, and we might even want to prevent or discourage such patches from
being disabled somehow.

> 	It works with any of the approaches (except null model) and while it
> 	needs two steps (patch, then enable conversion), it doesn't require two
> 	rounds of patching. Also, you don't have to consider oldfunc/newdata as
> 	that will never happen. oldfunc/olddata obviously works, so you only
> 	have to look at newfunc/olddata and newfunc/newdata as the
> 	transformation goes on.

Makes sense, I hadn't thought of the flag flipping approach for
SWITCH_THREAD.  It's definitely a lot better than requiring an
intermediate patch.

> I don't see either of these as really that much simpler. But I do see value
> in offering both.
> 
> > On the other hand, SWITCH_KERNEL doesn't have those problems.  It does
> > have the problem you mentioned, roughly 2% of the time, where it can't
> > patch functions which are always in use.  But in that case we can skip
> > the backtrace check ~90% of the time.  
> 
> An interesting bit is that when you skip the backtrace check you're
> actually reverting to LEAVE_FUNCION SWITCH_FUNCTION, forfeiting all
> consistency and not LEAVE_FUNCTION SWITCH_KERNEL as one would expect.

Hm, if we used stop machine (or ref counting), but without the backtrace
check, wouldn't it be LEAVE_FUNCTION SWITCH_KERNEL?

> Hence for those 2% of cases (going with your number, because it's a
> guess anyway) LEAVE_PATCHED_SET SWITCH_THREAD would in fact be a safer
> option.
> 
> > So it's really maybe something
> > like 0.2% of patches which can't be patched with SWITCH_KERNEL.  But
> > even then I think we could overcome that by getting creative, e.g. using
> > the multiple patch approach.
> > 
> > So my perspective is that SWITCH_THREAD causes big headaches 10% of the
> > time, whereas SWITCH_KERNEL causes small headaches 1.8% of the time, and
> > big headaches 0.2% of the time :-)
> 
> My preferred way would be to go with SWITCH_THREAD for the simpler stuff
> and do a SWITCH_KERNEL for the 10% of complex patches.

You said above that neither SWITCH_KERNEL nor SWITCH_THREAD is much
simpler than the other for the 10% case.  So why would you use
SWITCH_KERNEL here?

> This because (LEAVE_PATCHED_SET) SWITCH_THREAD finishes much quicker.
> But I'm biased there. ;)

Why would LEAVE_PATCHED_SET SWITCH_THREAD finish much quicker than
LEAVE_PATCHED_SET SWITCH_KERNEL?  Wouldn't they be about the same?

-- 
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/