lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170412151817.GG3956@linux.vnet.ibm.com>
Date:   Wed, 12 Apr 2017 08:18:17 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     linux-kernel@...r.kernel.org
Subject: Re: There is a Tasks RCU stall warning

On Wed, Apr 12, 2017 at 10:42:55AM -0400, Steven Rostedt wrote:
> On Wed, 12 Apr 2017 07:19:36 -0700
> "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote:
> 
> > On Wed, Apr 12, 2017 at 09:18:21AM -0400, Steven Rostedt wrote:
> > > On Tue, 11 Apr 2017 20:23:07 -0700
> > > "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com> wrote:
> > >   
> > > > But another question...
> > > > 
> > > > Suppose someone traced or probed or whatever a call to (say)
> > > > cond_resched_rcu_qs().  Wouldn't that put the call to this
> > > > function in the trampoline itself?  Of course, if this happened,
> > > > life would be hard when the trampoline was freed due to
> > > > cond_resched_rcu_qs() being a quiescent state.  
> > > 
> > > Not at all, because the trampoline happens at the beginning of the
> > > function. Not in the guts of it (unless something in the guts was
> > > traced). But even then, it should be fine as the change was already
> > > made.
> > > 
> > > 	/* unhook trampoline from function calls */
> > > 	unregister_ftrace_function(my_ops);
> > > 
> > > 	synchronize_rcu_tasks();
> > > 
> > > 	kfree(my_ops->trampoline);
> > > 
> > > 
> > > Thus, once the unregister_ftrace_function() is called, no new entries
> > > into the trampoline can happen. The synchronize_rcu_tasks() is to move
> > > those that are currently on a trampoline off.  
> > 
> > OK, good!  (I thought that these things could appear anywhere.)
> 
> Well the trampolines pretty much can, but they are removed before
> calling synchronize_rcu_tasks(), and nothing can enter the trampoline
> when that is called.

Color me confused...

So you can have an arbitrary function call within a trampoline?

If not, agreed, no problem.  Otherwise, it seems like we have a big
problem remaining.  Unless the functions called from a trampoline are
guaranteed never to do a context switch.

So what exactly is the trampoline code allowed to do?  ;-)

> > If it ever becomes necessary, I suppose you could have a function
> > call as the very last thing on a trampoline.  Do the (off-trampoline)
> > return-address push, jump at the function, and that is the last need
> > for the trampoline.
> 
> The point of trampolines is to optimize the function hooks, added
> features will kill that optimization. But then it gets even more
> complex. The trampolines are written in assembly and do special reg
> savings in order to call C code. And it needs to restore back to the
> original state before calling back to the function being traced. Thus,
> anything at the end of the trampoline will need to be written in
> assembly. Not sure writing RCU code in assembly would be much fun.

Writing RCU code as assembly code would indeed not be my first choice!

> > Assuming that the called function doesn't try accessing the code
> > surrounding the call, but that would be a problem in any case.
> > 
> > > Is there a way that a task could be in the middle of
> > > cond_resched_rcu_qs() and get preempted by something while on the
> > > ftrace trampoline, then the above "unregister_ftrace_function()" and
> > > "synchronize_rcu_tasks()" can be called and finish, while the one task
> > > is still on the trampoline and never finished the cond_resched_rcu_qs()?  
> > 
> > Well, if the kernel being ftraced is a guest OS and the hypervisor
> > preempts it at just that point...
> 
> Not sure what you mean by the above. You mean the hypervisor running
> ftrace on the guest OS? Or just a long pause on the guest OS (could
> also be an NMI). But in any case, we don't care about long pauses. We
> care about tasks going to sleep while on the trampoline, and the ftrace
> code that does the schedule_on_each_cpu() missing that task, because it
> was preempted, and not effected by the schedule_on_each_cpu() call.

The guest doing ftrace and the hypervisor preempting it.  But yes,
same thing as NMI.

> > > > Or is there something that takes care to avoid putting calls to
> > > > this sort of function (and calls to any function calling this sort
> > > > of function, directly or indirectly) into a trampoline?  
> > > 
> > > The question is, if its on the trampoline in one of theses functions
> > > when synchronize_rcu_tasks() is called, will it still be on the
> > > trampoline when that returns?  
> > 
> > If the function's return address is within the trampoline, it seems to
> > me that bad things could happen.
> 
> Not sure what you mean by the above. One should never be tracing within
> a trampoline, or calling synchronize_rcu_tasks() in one. The trampoline
> could be called from any context, including NMI.

My problem is that I have no idea what can and cannot be included in
trampoline code.  In absence of that information, my RCU-honed reflexes
jump immediately to the worst case that I can think of.  ;-)

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ