lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 8 May 2017 15:13:22 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Josh Poimboeuf <jpoimboe@...hat.com>
Cc:     Petr Mladek <pmladek@...e.com>, Jessica Yu <jeyu@...hat.com>,
        Jiri Kosina <jikos@...nel.org>,
        Miroslav Benes <mbenes@...e.cz>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        live-patching@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] livepatch/rcu: Warn when system consistency is
 broken in RCU code

On Mon, 8 May 2017 11:51:08 -0500
Josh Poimboeuf <jpoimboe@...hat.com> wrote:


> > --- a/kernel/livepatch/patch.c
> > +++ b/kernel/livepatch/patch.c
> > @@ -62,6 +62,14 @@ static void notrace klp_ftrace_handler(unsigned long ip,
> >  	/* RCU may not be watching, make it see us. */
> >  	rcu_irq_enter_irqson();
> >  
> > +	/*
> > +	 * RCU still might not see us if we patch a function inside
> > +	 * the RCU infrastructure. Then we might see wrong state of
> > +	 * func->stack and other flags.
> > +	 */
> > +	if (unlikely(!rcu_is_watching()))
> > +		WARN_ONCE(1, "Livepatch modified a function that can not be handled a safe way.!");
> > +
> >  	rcu_read_lock();
> >  
> >  	func = list_first_or_null_rcu(&ops->func_stack, struct klp_func,  
> 
> So the ftrace stack tracer seems to do this check a little differently.
> It calls rcu_irq_enter_disabled() first, and then calls rcu_irq_enter().
> Any reason we're not doing it that way?

I thought the same. Also can it not return, and not do anything.
Because continuing after the warning, is dangerous.

Not to mention, why the if statement in the first place? And then pass
a 1. Makes no sense.

Although you should have:

	if (WARN_ONCE(!rcu_is_watching,
			"Livepatch ..."))
		return;

or something to not cause any damage.

> 
> The warning would be more helpful if it printed a little more
> information, like the ip and the parent_ip.  And it should probably
> mention that RCU is broken.

Well, the warning would also print a stack trace. How is RCU broken? It
could simply be that you are patching a function that is in a place
that RCU doesn't "watch". Like going to idle or userspace. Or even in
RCU itself.

> 
> Also I wonder if we can constrain the warning somehow.  I think the
> warning only applies if a patch is in progress, right?  In that case, if
> RCU is broken, would it be feasible to mutex_trylock() the klp mutex to
> try to ensure that no patches are being applied while the ftrace handler
> is running?  Then it wouldn't matter if RCU were broken because the func
> stack wouldn't be changing anyway.  Then it could only warn if it failed
> to get the mutex.

How would RCU be broken?

> 
> Stepping back a bit, the documentation and comments describe patches to
> functions inside the RCU infrastructure.  As far as I can tell, only a
> single function would be affected: rcu_dynticks_eqs_enter().  Because
> it's the only function called after the "Breaks tracing momentarily"
> comment in rcu_eqs_enter_common().  Any reason why we couldn't just
> annotate rcu_dynticks_eqs_enter() with notrace?

Note, there's places in the kernel (on the way to idle and userspace)
that rcu is not watching. Ftrace handles this differently than most
places. But anything that requires calling rcu_read_lock(), well, you
need to beware.

> 
> Stepping back even further, if I'm understanding this issue correctly,
> this warning can also affect patches to functions which are called from
> NMI context.  If the NMI occurs in the part of rcu_eqs_enter_common()
> where rcu_irq_enter() doesn't work, then RCU won't work here, right?  If
> so, that worries me because there are a lot of functions which can be
> called (and patched) from NMI context.

Note, the "rcu_dynticks_eqs_enter()" is the only place that can't make
rcu "watch" again.

If rcu is not watching, calling rcu_enter_irq() will have it watch
again. Even in NMI context I believe.

> 
> I wonder if there's some way to solve this by changing RCU code, but I'm
> not familiar enough with RCU to have any ideas there.

You don't want to go there.

> 
> Another idea would be to figure out a way to stop using RCU in
> klp_ftrace_handler() altogether.
> 

That may work if rcu_enter_irq() doesn't. But that's how NMIs use rcu.

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ