linux-kernel - Re: [PATCH v3 tip/core/rcu 3/9] rcu: Add synchronous grace-period waiting for RCU-tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140808151643.GD9918@twins.programming.kicks-ass.net>
Date:	Fri, 8 Aug 2014 17:16:43 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Steven Rostedt <rostedt@...dmis.org>
Cc:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Oleg Nesterov <oleg@...hat.com>, linux-kernel@...r.kernel.org,
	mingo@...nel.org, laijs@...fujitsu.com, dipankar@...ibm.com,
	akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
	josh@...htriplett.org, tglx@...utronix.de, dhowells@...hat.com,
	edumazet@...gle.com, dvhart@...ux.intel.com, fweisbec@...il.com,
	bobby.prani@...il.com, masami.hiramatsu.pt@...achi.com
Subject: Re: [PATCH v3 tip/core/rcu 3/9] rcu: Add synchronous grace-period
 waiting for RCU-tasks

On Fri, Aug 08, 2014 at 10:58:58AM -0400, Steven Rostedt wrote:
> On Fri, 8 Aug 2014 16:34:13 +0200
> Peter Zijlstra <peterz@...radead.org> wrote:
> 
> > On Fri, Aug 08, 2014 at 10:12:21AM -0400, Steven Rostedt wrote:
> > > > Ok, so they're purely used in the function prologue/epilogue callchain.
> > > 
> > > No, they are also used by optimized kprobes. This is why optimized
> > > kprobes depend on !CONFIG_PREEMPT. [ added Masami to the discussion ].
> > 
> > How do those work? Is that one where the INT3 relocates the instruction
> > stream into an alternative 'text' and that JMPs back into the original
> > stream at the end?
> 
> No, it's where we replace the 'int3' with a jump to a trampoline that
> simulates an INT3. Speeds things up quite a bit.

Ah, ok. 

> > And what is there to make sure the kprobe itself doesn't do 'funny'?
> 
> Well, kprobes, like function callbacks are just restricted like
> interrupt handlers are. If they break, they break. They should know
> better ;-)

But is there debugging infrastructure to test they don't do funny? Or
are we just going to wait for some random runtime weirdness and then
pull our hair out?

So do we run these handlers with preempt_disable() or anything like
that? (maybe only as a debug option).

> > Sure, as long as you make absolutely sure none of that code ends up
> > calling cond_resched()/might_sleep() etc. Which I think you already said
> > was true, so no worries there.
> 
> Right. There's no guarantees that someone wont do such a stupid thing.
> But then, there's no guarantees that someone wont register an NMI
> callback with the same code too.

Well, kprobes is 'special' in that it it almost encourages random non
kernel devs to write modules for it, so it gets extra special creative
crap in.

I'm fairly sure you'll get your ftrace handler right, because that's
only you and maybe a few other people ever touching that code. Not so
with kprobes.

> > Sure, I get that part. What I was getting as is _WHY_ you need
> > call_rcu_task(), why isn't synchronize_tasks() good enough?
> 
> Oh, because that synchronize_tasks() may take minutes. And that means
> we wont be able to return for a long time. The only thing I can really
> see using call_rcu_task() is something that needs to free its data. Why
> wait around when all you're going to do is call free? It's basically
> just a garbage collector.

Well the waiting has the advantage of being a natural throttle on the
amount of memory tied up in the whole scheme.

> > > What we want to do today, is to create a dynamic trampoline for each of
> > > theses 1000 functions. Each function will call a separate trampoline
> > > that will only call the function that was registered to it. That way,
> > > we can have 1000 different ops registered to 1000 different functions
> > > and still have the same performance.
> > 
> > And how will you limit the amount of memory tied up in this? This looks
> > like a good way to tie up an immense amount of memory fast.
> 
> Well, these operations are currently only allowed by root. Thus, it's
> the thing that root should be careful about. The trampolines are small,
> and it will take a hell of a lot of callbacks to cause issues.

You said 10e3 order things, I then give you a small 32bit arm system.

The thing is, even root is a clueless idiot most of the times, yes we'll
let him shoot his foot off and give him enough rope to hang himself, and
we'll even show him how to tie the knot at times, but how is he to know
that this script that 'worked' now causes his machine to OOM and behave
brick like?

> The thing I'm worried about is to make sure they get freed. Otherwise a
> leak will cause more issues than anything else. Which also means we
> need to have a way to expedite call_rcu_tasks() if need be.

No.. for one, that doesn't follow. Things will get freed (eventually)
even without expedite, and secondly implementing expedite would mean
force scheduling tasks etc. And we're just so not going to do that.



Content of type "application/pgp-signature" skipped