[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111219114023.GB29855@elte.hu>
Date: Mon, 19 Dec 2011 12:40:23 +0100
From: Ingo Molnar <mingo@...e.hu>
To: Avi Kivity <avi@...hat.com>
Cc: Robert Richter <robert.richter@....com>,
Benjamin Block <bebl@...eta.org>,
Hans Rosenfeld <hans.rosenfeld@....com>, hpa@...or.com,
tglx@...utronix.de, suresh.b.siddha@...el.com, eranian@...gle.com,
brgerst@...il.com, Andreas.Herrmann3@....com, x86@...nel.org,
linux-kernel@...r.kernel.org,
Benjamin Block <benjamin.block@....com>
Subject: Re: [RFC 4/5] x86, perf: implements lwp-perf-integration (rc1)
* Avi Kivity <avi@...hat.com> wrote:
> On 12/19/2011 12:54 PM, Ingo Molnar wrote:
> > * Robert Richter <robert.richter@....com> wrote:
> >
> > > On 19.12.11 00:43:10, Ingo Molnar wrote:
> > >
> > > > So the question becomes, how well is it integrated: can perf
> > > > 'record -a + perf report', or 'perf top' use LWP, to do
> > > > system-wide precise [user-space] profiling and such?
> > >
> > > There is only self-monitoring of a process possible, no
> > > kernel and system-wide profiling. This is because we can
> > > not allocate memory regions in the kernel for a thread
> > > other than the current. This would require a complete
> > > rework of mm code.
> >
> > Hm, i don't think a rework is needed: check the
> > vmalloc_to_page() code in kernel/events/ring_buffer.c. Right
> > now CONFIG_PERF_USE_VMALLOC is an ARM, MIPS, SH and Sparc
> > specific feature, on x86 it turns on if
> > CONFIG_DEBUG_PERF_USE_VMALLOC=y.
> >
> > That should be good enough for prototyping the kernel/user
> > shared buffering approach.
>
> LWP wants user memory, vmalloc is insufficient. You need
> do_mmap() with a different mm.
Take a look at PERF_USE_VMALLOC, it allows in-kernel allocated
memory to be mmap()ed to user-space. It is basically a
shared/dual user/kernel mode vmalloc implementation.
So all the conceptual pieces are there.
> You could let a workqueue call use_mm() and then do_mmap().
> Even then it is subject to disruption by the monitored thread
> (and may disrupt the monitored thread by playing with its
> address space). [...]
Injecting this into another thread's context is indeed advanced
stuff:
> [...] This is for thread monitoring only, I don't think
> system-wide monitoring is possible with LWP.
That should be possible too, via two methods:
1) the easy hack: a (per cpu) vmalloc()ed buffer is made ring 3
accessible (by clearing the system bit in the ptes) - and
thus accessible to all user-space.
This is obviously globally writable/readable memory so only a
debugging/prototyping hack - but would be a great first step
to prove the concept and see some nice perf top and perf
record results ...
2) the proper solution: creating a 'user-space vmalloc()' that
is per mm and that gets inherited transparently, across
fork() and exec(), and which lies outside the regular vma
spaces. On 64-bit this should be straightforward.
These vmas are not actually 'known' to user-space normally -
the kernel PMU code knows about it and does what we do with
PEBS: flushes it when necessary and puts it into the
regular perf event channels.
This solves the inherited perf record workflow immediately:
the parent task just creates the buffer, which gets inherited
across exec() and fork(), into every portion of the workload.
System-wide profiling is a small additional variant of this:
creating such a user-vmalloc() area for all tasks in the
system so that the PMU code has them ready in the
context-switch code.
Solution #2 has the additional advantage that we could migrate
PEBS to it and could allow interested user-space access to the
'raw' PEBS buffer as well. (currently the PEBS buffer is only
visible to kernel-space.)
I'd suggest the easy hack first, to get things going - we can
then help out with the proper solution.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists