linux-kernel - Re: [PATCH] livepatch/rcu: Fix stacking of patches when RCU infrastructure is patched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20170608162643.fe3dpppypgxudeke@treble>
Date:   Thu, 8 Jun 2017 11:26:43 -0500
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Miroslav Benes <mbenes@...e.cz>
Cc:     Petr Mladek <pmladek@...e.com>, Jessica Yu <jeyu@...hat.com>,
        Jiri Kosina <jikos@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        live-patching@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] livepatch/rcu: Fix stacking of patches when RCU
 infrastructure is patched

On Tue, Jun 06, 2017 at 01:07:40PM +0200, Miroslav Benes wrote:
> On Tue, 23 May 2017, Petr Mladek wrote:
> 
> > rcu_read_(un)lock(), list_*_rcu(), and synchronize_rcu() are used for
> > a secure access and manipulation of the list of patches that modify
> > the same function. In particular, it is the variable func_stack that
> > is accessible from the ftrace handler via struct ftrace_ops and klp_ops.
> > 
> > Of course, it synchronizes also some states of the patch on the top
> > of the stack, e.g. func->transition in klp_ftrace_handler.
> > 
> > At the same time, this mechanism guards also the manipulation
> > of task->patch_state. It is modified according to the state of
> > the transition and the state of the process.
> > 
> > Now, all this works well as long as RCU works well. Sadly livepatching
> > might get into some corner cases when this is not true. For example,
> > RCU is not watching when rcu_read_lock() is taken in idle threads.
> > It is because they might sleep and prevent reaching the grace period
> > for too long.
> > 
> > There are ways how to make RCU watching even in idle threads,
> > see rcu_irq_enter(). But there is a small location inside RCU
> > infrastructure when even this does not work.
> > 
> > This small problematic location can be detected either before
> > calling rcu_irq_enter() by rcu_irq_enter_disabled() or later by
> > rcu_is_watching(). Sadly, there is no safe way how to handle it.
> > Once we detect that RCU was not watching, we might see inconsistent
> > state of the function stack and the related variables in
> > klp_ftrace_handler(). Then we could do a wrong decision,
> > use an incompatible implementation of the function and
> > break the consistency of the system. We could warn but
> > we could not avoid the damage.
> > 
> > Fortunately, ftrace has similar problems and they seem to
> > be solved well there. It uses a heavy weight implementation
> > of some RCU operations. In particular, it replaces:
> > 
> >   + rcu_read_lock() with preempt_disable_notrace()
> >   + rcu_read_unlock() with preempt_enable_notrace()
> >   + synchronize_rcu() with schedule_on_each_cpu(sync_work)
> > 
> > My understanding is that this is RCU implementation from
> > a stone age. It meets the core RCU requirements but it is
> > rather ineffective. Especially, it does not allow to batch
> > or speed up the synchronize calls.
> > 
> > On the other hand, it is very trivial. It allows to safely
> > trace and/or livepatch even the RCU core infrastructure.
> > And the effectiveness is a not a big issue because using ftrace
> > or livepatches on productive systems is a rare operation.
> > The safety is much more important than a negligible extra
> > load.
> > 
> > Note that the alternative implementation follows the RCU
> > principles. Therefore, we could and actually must use
> > list_*_rcu() variants when manipulating the func_stack.
> > These functions allow to access the pointers in
> > the right order and with the right barriers. But they
> > do not use any other information that would be set
> > only by rcu_read_lock().
> > 
> > Signed-off-by: Petr Mladek <pmladek@...e.com>
> 
> I think this should work... if I understand the problem and how RCU works
> correctly.
> 
> > Second, ftrace needs to make sure that nobody is inside
> > the dynamic trampoline when it is being freed. For this, it also
> > calls synchronize_rcu_tasks() in preemptive kernel in
> > ftrace_shutdown().
> > 
> > Livepatch has similar problem but it should get solved by ftrace
> > for free. klp_ftrace_handler() is a good guy and newer sleeps.
> > In addition, it is registered with FTRACE_OPS_FL_DYNAMIC. It causes
> > that unregister_ftrace_function() calls:
> > 
> > 	* schedule_on_each_cpu(ftrace_sync) - always
> > 	* synchronize_rcu_tasks() - in preemptive kernel
> > 
> > The effect is that nobody is neither inside the dynamic trampoline
> > nor inside the ftrace handler. 
> 
> I would add something along these lines to the changelog. That we rely on 
> ftrace to do the right thing, because we are a good citizen.

I agree, most of the text below the '---' marker should be in the
changelog.  Otherwise the patch looks good to me, other than the comment
typos I mentioned earlier.

-- 
Josh