lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1269435604.5109.235.camel@twins>
Date:	Wed, 24 Mar 2010 14:00:04 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Masami Hiramatsu <mhiramat@...hat.com>,
	Mel Gorman <mel@....ul.ie>,
	Ananth N Mavinakayanahalli <ananth@...ibm.com>,
	Jim Keniston <jkenisto@...ux.vnet.ibm.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	"Frank Ch. Eigler" <fche@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Roland McGrath <roland@...hat.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Christoph Hellwig <hch@...radead.org>,
	Ulrich Drepper <drepper@...hat.com>,
	Tom Tromey <tromey@...hat.com>
Subject: Re: [PATCH v1 7/10] Uprobes Implementation

On Wed, 2010-03-24 at 13:28 +0530, Srikar Dronamraju wrote:
> Hi Peter, 
> 
> > > > I would still prefer to see something like:
> > > > 
> > > >  vma:offset, instead of tid:vaddr
> > > >  
> > > > You want to probe a symbol in a DSO, filtering per-task comes after that
> > > > if desired.
> > > > 
> > 
> > > do you mean the user should be specifying 357c200000:74b80 to denote
> > > 000000357c274b80? or /lib64/libc.so.6:74b80
> > > And we trace all the process which have mapped this address?
> > 
> > Well userspace would simply specify something like: /lib/libc.so:malloc,
> > we'd probably communicate that to the kernel using a filedesc and
> > offset.
> > 
> > And yes, all processes that share that DSO, consumers can install
> > filters.
> > 
> 
> I think perf would be using uprobes in one of the four ways.
> - Trace a particular process.
> - Trace a particular session.
> - Trace all instances of an executable. 
> - Trace all programs in the system.
> 
> If we use global approach, filtering would still be part of the handler.
> So even if we want to probe just one process, we would still take hit
> for all processes that map the DSO and hit that vaddr.
> Other process could be hitting the probepoint more often while the
> probed process could rarely be hitting the probepoint. This could
> place significant overhead on the system.
> 
> Also with KSM, the page we are probing could be part of the stable tree
> and mapped by different virtual machines. Can this lead to interruptting
> work on an unrelated virtual machine? If yes, Is it okay to interrupt an
> unrelated VM? If not, what measures need to be taken?
> 
> Currently perf can be used by priviledged users. However when perf gets
> to trace user space programs, would it still be limited to priviledged
> users. Do we have plans to allow users to trace their owned
> applications thro perf?

I'm not sure, currently all the tracing bits require root. One of the
complications is that dynamic trace events (kprobes and uprobes) share a
global namespace, so making that accessible to users might be
interesting.

So one thing we can do to avoid some of the trap overhead is to
de-couple the trace event creation from trace event enable (pretty much
already so for existing implementations), so while you define a dynamic
trace event as dso:sym, you provide ways to enable it globally and per
task.

We'd basically need a global and per-task refcount on enable and make
sure the breakpoint is installed properly for (global || task).

That way a perf per-cpu event will do the global enable, and a perf
per-task event will do the task enable.

> > > > This should allow the handler to optimistically access memory from the
> > > > trap handler, but in case it does need to fault pages in we'll call it
> > > > from task context.
> > > 
> > > Okay but what if the handler is coded to sleep.
> > 
> > Don't do that ;-)
> > 
> > What reason would you have to sleep from a int3 anyway? You want to log
> > bits and get on with life, right? The only interesting case is faulting
> > when some memory references you want are not currently available, and
> > that can be done as suggested.
> > 
> 
> Though one of the usp of uprobes is non disruptive tracing, applications
> like debuggers who do disruptive tracing can benefit from uprobes. 
> 
> Debuggers could use uprobes as a feature to implement inserting/removing
> breakpoints and get the out of line single-stepping. In an earlier
> discussion http://lkml.org/lkml/2010/1/26/344 Tom Tromey did say that if
> a facility was given, it could be used in gdb.
> 
> What I expect is the tracee to inform the tracer that it has hit the
> breakpoint and "wait" for the tracer to give indication to continue.
> 
> Benefits could be 
> - Debuggers can benefit from execution out of line and can debug
>   multithread processes much better. 
> 
> - Two debbugers/tracers could trace the same process. One of the tracer
>   could be strace, while the other one could be gdb.
> 
> - perf and debugger could be interested in the same vaddr for that
> process and still continue to work. 
> Lets say debugger and perf are interested in a particular function for
> example malloc.
> If perf uses uprobes and debuggers uses existing methods, then perf
> measures of malloc may not be accurate as it misses those mallocs of the
> process that's being debugged. However I agree that its a very very very
> minute case.

A double scribble will be an issue for the current generation of
debuggers anyway, right?

But yes, I suppose if you want to use uprobes for debuggers then yes it
makes sense to allow to put the task to sleep. One way would be to
provide means for the handler to detect the context and simply always
return -EFAULT from the trap context.

> > > > Everybody else simply places callbacks in kernel/fork.c and
> > > > kernel/exit.c, but as it is I don't think you want per-task state like
> > > > this.
> > > > 
> > > > One thing I would like to see is a slot per task, that has a number of
> > > > advantages over the current patch-set in that it doesn't have one page
> > > > limit in number of probe sites, nor do you need to insert vmas into each
> > > > and every address space that happens to have your DSO mapped.
> > > > 
> > > 
> > > where are the per task slots stored?
> > > or Are you looking at a XOL vma area per DSO?
> > 
> > The per task slot (note the singular, each task needs only ever have a
> > single slot since a task can only ever hit one trap at a time) would
> > live in the task TLS or task stack.
> > 
> 
> Do we need a buy-in from glibc folks to do this?
> Also here is what Roland had once said about TLS.
> 
> "Next we come to the problem of where to store copied instructions for
> stepping.  The idea of stealing a stack page for this is a non-starter.
> For both security and robustness, it's never acceptable to introduce a
> user mapping that is both writable and executable, even temporarily.  We
> need to use an otherwise unused page in the address space, that will be
> read/execute only for the user, we can write to it only from kernel
> mode."

Before NX there simply was no option, anyway, I guess the writable
requirement comes from being stack, and I'm not sure how TLS is done,
but I guess that has similar constraints on being writable, right?

I've heard from people that some other OS does indeed have the
trampoline in TLS.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ