linux-kernel - Re: live patching design (was: Re: [PATCH 1/3] sched: add sched_task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150221183005.GB8406@gmail.com>
Date:	Sat, 21 Feb 2015 19:30:05 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Josh Poimboeuf <jpoimboe@...hat.com>
Cc:	Vojtech Pavlik <vojtech@...e.com>, Jiri Kosina <jkosina@...e.cz>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...hat.com>,
	Seth Jennings <sjenning@...hat.com>,
	linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: live patching design (was: Re: [PATCH 1/3] sched: add
 sched_task_call())

* Josh Poimboeuf <jpoimboe@...hat.com> wrote:

> On Fri, Feb 20, 2015 at 10:46:13PM +0100, Vojtech Pavlik wrote:
> > On Fri, Feb 20, 2015 at 08:49:01PM +0100, Ingo Molnar wrote:
> >
> > > I.e. it's in essence the strong stop-all atomic 
> > > patching model of 'kpatch', combined with the 
> > > reliable avoidance of kernel stacks that 'kgraft' 
> > > uses.
> > 
> > > That should be the starting point, because it's the 
> > > most reliable method.
> > 
> > In the consistency models discussion, this was marked 
> > the "LEAVE_KERNEL+SWITCH_KERNEL" model. It's indeed the 
> > strongest model of all, but also comes at the highest 
> > cost in terms of impact on running tasks. It's so high 
> > (the interruption may be seconds or more) that it was 
> > deemed not worth implementing.
> 
> Yeah, this is way too disruptive to the user.
> 
> Even the comparatively tiny latency caused by kpatch's 
> use of stop_machine() was considered unacceptable by 
> some.

Unreliable, unrobust patching is even more disruptive...

What I think makes it long term fragile is that we combine 
two unrobust, unlikely mechanisms: the chance that a task 
just happens to execute a patched function, with the chance 
that debug information is unreliable.

For example tracing patching got debugged to a fair degree 
because we rely on the patching for actual tracing 
functionality. Even with that relatively robust usage model 
we had our crises ...

I just don't see how a stack backtrace based live patching 
method can become robust in the long run.

> Plus a lot of processes would see EINTR, causing more 
> havoc.

Parking threads safely in user mode does not require the 
propagation of syscall interruption to user-space.

(It does have some other requirements, such as making all 
syscalls interruptible to a 'special' signalling method 
that only live patching triggers - even syscalls that are 
under the normal ABI uninterruptible, such as sys_sync().)

On the other hand, if it's too slow, people will work on 
improving signal propagation latencies: making syscalls 
more readily interruptible and more seemlessly restartable 
has various other advantages beyond live kernel patching.

I.e. it's a win-win scenario and will improve various areas 
of the kernel in terms of syscall interruptability 
latencies.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/