[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150224123601.GC3081@suse.cz>
Date: Tue, 24 Feb 2015 13:36:01 +0100
From: Vojtech Pavlik <vojtech@...e.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: Josh Poimboeuf <jpoimboe@...hat.com>,
Jiri Kosina <jkosina@...e.cz>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...hat.com>,
Seth Jennings <sjenning@...hat.com>,
linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Arjan van de Ven <arjan@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Borislav Petkov <bp@...en8.de>, live-patching@...r.kernel.org
Subject: Re: live kernel upgrades (was: live kernel patching design)
On Tue, Feb 24, 2015 at 11:23:29AM +0100, Ingo Molnar wrote:
> > Your upgrade proposal is an *enormous* disruption to the
> > system:
> >
> > - a latency of "well below 10" seconds is completely
> > unacceptable to most users who want to patch the kernel
> > of a production system _while_ it's in production.
>
> I think this statement is false for the following reasons.
The statement is very true.
> - I'd say the majority of system operators of production
> systems can live with a couple of seconds of delay at a
> well defined moment of the day or week - with gradual,
> pretty much open ended improvements in that latency
> down the line.
In the most usual corporate setting any noticeable outage, even out of
business hours, requires an ahead notice, and an agreement of all
stakeholders - teams that depend on the system.
If a live patching technology introduces an outage, it's not "live" and
because of these bureaucratic reasons, it will not be used and a regular
reboot will be scheduled instead.
> - I think your argument ignores the fact that live
> upgrades would extend the scope of 'users willing to
> patch the kernel of a production system' _enormously_.
>
> For example, I have a production system with this much
> uptime:
>
> 10:50:09 up 153 days, 3:58, 34 users, load average: 0.00, 0.02, 0.05
>
> While currently I'm reluctant to reboot the system to
> upgrade the kernel (due to a reboot's intrusiveness),
> and that is why it has achieved a relatively high
> uptime, but I'd definitely allow the kernel to upgrade
> at 0:00am just fine. (I'd even give it up to a few
> minutes, as long as TCP connections don't time out.)
>
> And I don't think my usecase is special.
I agree that this is useful. But it is a different problem that only
partially overlaps with what we're trying to achieve with live patching.
If you can make full kernel upgrades to work this way, which I doubt is
achievable in the next 10 years due to all the research and
infrastructure needed, then you certainly gain an additional group of
users. And a great tool. A large portion of those that ask for live
patching won't use it, though.
But honestly, I prefer a solution that works for small patches now, than
a solution for unlimited patches sometime in next decade.
> What gradual improvements in live upgrade latency am I
> talking about?
>
> - For example the majority of pure user-space process
> pages in RAM could be saved from the old kernel over
> into the new kernel - i.e. they'd stay in place in RAM,
> but they'd be re-hashed for the new data structures.
> This avoids a big chunk of checkpointing overhead.
I'd have hoped this would be a given. If you can't preserve memory
contents and have to re-load from disk, you can just as well reboot
entirely, the time needed will not be much more..
> - Likewise, most of the page cache could be saved from an
> old kernel to a new kernel as well - further reducing
> checkpointing overhead.
>
> - The PROT_NONE mechanism of the current NUMA balancing
> code could be used to transparently mark user-space
> pages as 'checkpointed'. This would reduce system
> interruption as only 'newly modified' pages would have
> to be checkpointed when the upgrade happens.
>
> - Hardware devices could be marked as 'already in well
> defined state', skipping the more expensive steps of
> driver initialization.
>
> - Possibly full user-space page tables could be preserved
> over an upgrade: this way user-space execution would be
> unaffected even in the micro level: cache layout, TLB
> patterns, etc.
>
> There's lots of gradual speedups possible with such a model
> IMO.
Yes, as I say above, guaranteeing decades of employment. ;)
> With live kernel patching we run into a brick wall of
> complexity straight away: we have to analyze the nature of
> the kernel modification, in the context of live patching,
> and that only works for the simplest of kernel
> modifications.
But you're able to _use_ it.
> With live kernel upgrades no such brick wall exists, just
> about any transition between kernel versions is possible.
The brick wall you run to is "I need to implement full kernel state
serialization before I can do anything at all." That's something that
isn't even clear _how_ to do. Particularly with Linux kernel's
development model where internal ABI and structures are always in flux
it may not even be realistic.
> Granted, with live kernel upgrades it's much more complex
> to get the 'simple' case into an even rudimentarily working
> fashion (full userspace state has to be enumerated, saved
> and restored), but once we are there, it's a whole new
> category of goodness and it probably covers 90%+ of the
> live kernel patching usecases on day 1 already ...
Feel free to start working on it. I'll stick with live patching.
--
Vojtech Pavlik
Director SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists