linux-kernel - Re: live kernel upgrades (was: live kernel patching design)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150224123601.GC3081@suse.cz>
Date:	Tue, 24 Feb 2015 13:36:01 +0100
From:	Vojtech Pavlik <vojtech@...e.com>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Josh Poimboeuf <jpoimboe@...hat.com>,
	Jiri Kosina <jkosina@...e.cz>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...hat.com>,
	Seth Jennings <sjenning@...hat.com>,
	linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Arjan van de Ven <arjan@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Borislav Petkov <bp@...en8.de>, live-patching@...r.kernel.org
Subject: Re: live kernel upgrades (was: live kernel patching design)

On Tue, Feb 24, 2015 at 11:23:29AM +0100, Ingo Molnar wrote:

> > Your upgrade proposal is an *enormous* disruption to the 
> > system:
> > 
> > - a latency of "well below 10" seconds is completely
> >   unacceptable to most users who want to patch the kernel 
> >   of a production system _while_ it's in production.
> 
> I think this statement is false for the following reasons.

The statement is very true.

>   - I'd say the majority of system operators of production 
>     systems can live with a couple of seconds of delay at a 
>     well defined moment of the day or week - with gradual, 
>     pretty much open ended improvements in that latency 
>     down the line.

In the most usual corporate setting any noticeable outage, even out of
business hours, requires an ahead notice, and an agreement of all
stakeholders - teams that depend on the system.

If a live patching technology introduces an outage, it's not "live" and
because of these bureaucratic reasons, it will not be used and a regular
reboot will be scheduled instead.

>   - I think your argument ignores the fact that live 
>     upgrades would extend the scope of 'users willing to 
>     patch the kernel of a production system' _enormously_. 
> 
>     For example, I have a production system with this much 
>     uptime:
> 
>        10:50:09 up 153 days,  3:58, 34 users,  load average: 0.00, 0.02, 0.05
> 
>     While currently I'm reluctant to reboot the system to 
>     upgrade the kernel (due to a reboot's intrusiveness), 
>     and that is why it has achieved a relatively high 
>     uptime, but I'd definitely allow the kernel to upgrade 
>     at 0:00am just fine. (I'd even give it up to a few 
>     minutes, as long as TCP connections don't time out.)
> 
>     And I don't think my usecase is special.

I agree that this is useful. But it is a different problem that only
partially overlaps with what we're trying to achieve with live patching.

If you can make full kernel upgrades to work this way, which I doubt is
achievable in the next 10 years due to all the research and
infrastructure needed, then you certainly gain an additional group of
users. And a great tool. A large portion of those that ask for live
patching won't use it, though.

But honestly, I prefer a solution that works for small patches now, than
a solution for unlimited patches sometime in next decade.

> What gradual improvements in live upgrade latency am I 
> talking about?
> 
>  - For example the majority of pure user-space process 
>    pages in RAM could be saved from the old kernel over 
>    into the new kernel - i.e. they'd stay in place in RAM, 
>    but they'd be re-hashed for the new data structures. 
>    This avoids a big chunk of checkpointing overhead.

I'd have hoped this would be a given. If you can't preserve memory
contents and have to re-load from disk, you can just as well reboot
entirely, the time needed will not be much more..

>  - Likewise, most of the page cache could be saved from an
>    old kernel to a new kernel as well - further reducing
>    checkpointing overhead.
> 
>  - The PROT_NONE mechanism of the current NUMA balancing
>    code could be used to transparently mark user-space 
>    pages as 'checkpointed'. This would reduce system 
>    interruption as only 'newly modified' pages would have 
>    to be checkpointed when the upgrade happens.
> 
>  - Hardware devices could be marked as 'already in well
>    defined state', skipping the more expensive steps of 
>    driver initialization.
> 
>  - Possibly full user-space page tables could be preserved 
>    over an upgrade: this way user-space execution would be 
>    unaffected even in the micro level: cache layout, TLB
>    patterns, etc.
> 
> There's lots of gradual speedups possible with such a model 
> IMO.

Yes, as I say above, guaranteeing decades of employment. ;)

> With live kernel patching we run into a brick wall of 
> complexity straight away: we have to analyze the nature of 
> the kernel modification, in the context of live patching, 
> and that only works for the simplest of kernel 
> modifications.

But you're able to _use_ it.

> With live kernel upgrades no such brick wall exists, just 
> about any transition between kernel versions is possible.

The brick wall you run to is "I need to implement full kernel state
serialization before I can do anything at all." That's something that
isn't even clear _how_ to do. Particularly with Linux kernel's
development model where internal ABI and structures are always in flux
it may not even be realistic.

> Granted, with live kernel upgrades it's much more complex 
> to get the 'simple' case into an even rudimentarily working 
> fashion (full userspace state has to be enumerated, saved 
> and restored), but once we are there, it's a whole new 
> category of goodness and it probably covers 90%+ of the 
> live kernel patching usecases on day 1 already ...

Feel free to start working on it. I'll stick with live patching.

-- 
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/