linux-kernel - Re: live kernel upgrades (was: live kernel patching design)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54EC6A8B.2050106@suse.cz>
Date:	Tue, 24 Feb 2015 13:11:55 +0100
From:	Jiri Slaby <jslaby@...e.cz>
To:	Ingo Molnar <mingo@...nel.org>, Jiri Kosina <jkosina@...e.cz>
CC:	Vojtech Pavlik <vojtech@...e.com>,
	Josh Poimboeuf <jpoimboe@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...hat.com>,
	Seth Jennings <sjenning@...hat.com>,
	linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Arjan van de Ven <arjan@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Borislav Petkov <bp@...en8.de>,
	Pavel Emelyanov <xemul@...nvz.org>
Subject: Re: live kernel upgrades (was: live kernel patching design)

On 02/22/2015, 10:46 AM, Ingo Molnar wrote:
> Arbitrary live kernel upgrades could be achieved by 
> starting with the 'simple method' I outlined in earlier 
> mails, using some of the methods that kpatch and kGraft are 
> both utilizing or planning to utilize:
> 
>   - implement user task and kthread parking to get the 
>     kernel into quiescent state.
> 
>   - implement (optional, thus ABI-compatible) 
>     system call interruptability and restartability 
>     support.
> 
>   - implement task state and (limited) device state
>     snapshotting support
> 
>   - implement live kernel upgrades by:
> 
>       - snapshotting all system state transparently
> 
>       - fast-rebooting into the new kernel image without 
>         shutting down and rebooting user-space, i.e. _much_ 
>         faster than a regular reboot.
> 
>       - restoring system state transparently within the new 
>         kernel image and resuming system workloads where 
>         they were left.
> 
> Even complex external state like TCP socket state and 
> graphics state can be preserved over an upgrade. As far as 
> the user is concerned, nothing happened but a brief pause - 
> and he's now running a v3.21 kernel, not v3.20.
> 
> Obviously one of the simplest utilizations of live kernel 
> upgrades would be to apply simple security fixes to 
> production systems. But that's just a very simple 
> application of a much broader capability.
> 
> Note that if done right, then the time to perform a live 
> kernel upgrade on a typical system could be brought to well 
> below 10 seconds system stoppage time: adequate to the vast 
> majority of installations.
> 
> For special installations or well optimized hardware the 
> latency could possibly be brought below 1 second stoppage 
> time.

Hello,

IMNSHO, you cannot.

The criu-based approach you have just described is already alive as an
external project in Parallels. It is of course a perfect solution for
some use cases. But its use case is a distinctive one. It is not our
competitor, it is our complementer. I will try to explain why.

It is highly dependent on HW. Kexec is not (or any other arbitrary
kernel-exchange mechanism would not be) supported by all HW, neither
drivers. There is not even a way to implement snapshotting for some
devices which is a real issue, obviously.

Downtime is highly dependent on the scenario. If you have a plenty of
dirty memory, you have to flush first. This might be minutes, especially
when using a network FS. Or you need not, but a failure to replace a
kernel is then lethal. If you have a heap of open FD, restore time will
take ages. You cannot fool any of those. It's pure I/O. You cannot
estimate the downtime and that is a real downside.

Even if you can get the criu time under one second, this is still
unacceptable for live patching. Live patching shall be by 3 orders of
magnitude faster than that, otherwise it makes no sense. If you can
afford a second, you probably already have a large enough windows or
failure handling to perform a full and mainly safer reboot/kexec anyway.

You cannot restore everything.
* TCP is one of the pure beasts in this. And there is indeed a plenty of
theoretical papers behind this, explaining what can or cannot be done.
* NFS is another one.
* Xorg. Today, we cannot even fluently switch between discreet and
native GFX chip. No go.
* There indeed are situations, where NP-hard problems need to be solved
upon restoration. No way, if you want to restore yet in this century.

While you cannot live-patch everything using KLP, it is patch-dependent.
Failure of restoration is condition-dependent and the condition is
really fuzzy. That is a huge difference.

Despite you put criu-based approach as provably safe and correct, it is
not in many cases and cannot be by definition.

That said, we are not going to start moving that way, except the many
good points which emerged during the discussion (fake signals to pick one).

> This 'live kernel upgrades' approach would have various 
> advantages:
> 
>   - it brings together various principles working towards 
>     shared goals:
> 
>       - the boot time reduction folks
>       - the checkpoint/restore folks
>       - the hibernation folks
>       - the suspend/resume and power management folks
>       - the live patching folks (you)
>       - the syscall latency reduction folks
> 
>     if so many disciplines are working together then maybe 
>     something really good and long term maintainble can 
>     crystalize out of that effort.

I must admit, whenever I implemented something in the kernel, nobody did
any work for me. So the above will only result in live patching teams to
do all the work. I am not saying we do not want to do the work. I am
only pointing out that there is nothing like "work together with other
teams" (unless we are sending them their pay-bills).

>   - it ignores the security theater that treats security
>     fixes as a separate, disproportionally more important
>     class of fixes and instead allows arbitrary complex 
>     changes over live kernel upgrades.

Hmm, more changes, more regressions. Complex changes, even more
regressions. No customer desires complex changes in such udpates.

>   - there's no need to 'engineer' live patches separately, 
>     there's no need to review them and their usage sites 
>     for live patching relevant side effects. Just create a 
>     'better' kernel as defined by users of that kernel:

Review is the basic process which has to be done in any way.

ABI is stable, not much in reality. criu has the same deficiency as KLP
in here:
* One example is file size of entries in /sys or /proc. That can change
and you have to take care of it as processes "assume" something.
* Return values of syscalls are standardized, but nothing protects
anybody to change them in subsequent kernels. But state machines in
processes might be confused by a different retval from two subsequent
syscalls (provided by two kernels).

>       - in the enterprise distro space create a more stable 
>         kernel and allow transparent upgrades into it.

This is IMHO unsupportable.

> We have many of the building blocks in place and have them 
> available:
> 
>   - the freezer code already attempts at parking/unparking
>     threads transparently, that could be fixed/extended.

That is broken in many funny ways. It needs to be fixed in any case:
nothing defines a good freezing point and something of course should.
And if we want to use those well-defined points? No doubt. Freezer will
benefit of course too.

>   - hibernation, regular suspend/resume and in general
>     power management has in essence already implemented
>     most building blocks needed to enumerate and
>     checkpoint/restore device state that otherwise gets
>     lost in a shutdown/reboot cycle.

Not at all. A lot of suspend/resume hooks results only in
shutdown/reset. That is not what criu wants. And in many cases,
implementing c/r is not feasible (see above).

> A feature like arbitrary live kernel upgrades would be well 
> worth the pain and would be worth the complications, and 
> it's actually very feasible technically.

Yes, I like criu very much, but it is not going to save the Universe as
are your beliefs. Neither KLP. Remember, they are complementary. Maybe
Pavel can comment on this too.

> The goals of the current live kernel patching projects, 
> "being able to apply only the simplest of live patches", 
> which would in my opinion mostly serve the security 
> theater?

No, we all pray to the KISS principle. Having something basic, which
works and can be built upon is everything we want as the starting point.
Extending the functionality is the right way. Not our idea, the recent
SW management point of views tell. User-driven development is called one
successful.

> They are not forward looking enough, and in that 
> sense they could even be counterproductive.

Being able to apply over 90 % of CVEs in 3.20 does not sound bad or
counterproductive to me at all, sorry.

thanks,
-- 
js
suse labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/