lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111219114023.GB29855@elte.hu>
Date:	Mon, 19 Dec 2011 12:40:23 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Avi Kivity <avi@...hat.com>
Cc:	Robert Richter <robert.richter@....com>,
	Benjamin Block <bebl@...eta.org>,
	Hans Rosenfeld <hans.rosenfeld@....com>, hpa@...or.com,
	tglx@...utronix.de, suresh.b.siddha@...el.com, eranian@...gle.com,
	brgerst@...il.com, Andreas.Herrmann3@....com, x86@...nel.org,
	linux-kernel@...r.kernel.org,
	Benjamin Block <benjamin.block@....com>
Subject: Re: [RFC 4/5] x86, perf: implements lwp-perf-integration (rc1)


* Avi Kivity <avi@...hat.com> wrote:

> On 12/19/2011 12:54 PM, Ingo Molnar wrote:
> > * Robert Richter <robert.richter@....com> wrote:
> >
> > > On 19.12.11 00:43:10, Ingo Molnar wrote:
> > >
> > > > So the question becomes, how well is it integrated: can perf 
> > > > 'record -a + perf report', or 'perf top' use LWP, to do 
> > > > system-wide precise [user-space] profiling and such?
> > > 
> > > There is only self-monitoring of a process possible, no 
> > > kernel and system-wide profiling. This is because we can 
> > > not allocate memory regions in the kernel for a thread 
> > > other than the current. This would require a complete 
> > > rework of mm code.
> >
> > Hm, i don't think a rework is needed: check the 
> > vmalloc_to_page() code in kernel/events/ring_buffer.c. Right 
> > now CONFIG_PERF_USE_VMALLOC is an ARM, MIPS, SH and Sparc 
> > specific feature, on x86 it turns on if 
> > CONFIG_DEBUG_PERF_USE_VMALLOC=y.
> >
> > That should be good enough for prototyping the kernel/user 
> > shared buffering approach.
> 
> LWP wants user memory, vmalloc is insufficient.  You need 
> do_mmap() with a different mm.

Take a look at PERF_USE_VMALLOC, it allows in-kernel allocated 
memory to be mmap()ed to user-space. It is basically a 
shared/dual user/kernel mode vmalloc implementation.

So all the conceptual pieces are there.

> You could let a workqueue call use_mm() and then do_mmap().  
> Even then it is subject to disruption by the monitored thread 
> (and may disrupt the monitored thread by playing with its 
> address space). [...]

Injecting this into another thread's context is indeed advanced 
stuff:

> [...] This is for thread monitoring only, I don't think 
> system-wide monitoring is possible with LWP.

That should be possible too, via two methods:

1) the easy hack: a (per cpu) vmalloc()ed buffer is made ring 3 
   accessible (by clearing the system bit in the ptes) - and 
   thus accessible to all user-space.

   This is obviously globally writable/readable memory so only a 
   debugging/prototyping hack - but would be a great first step 
   to prove the concept and see some nice perf top and perf 
   record results ...

2) the proper solution: creating a 'user-space vmalloc()' that 
   is per mm and that gets inherited transparently, across 
   fork() and exec(), and which lies outside the regular vma
   spaces. On 64-bit this should be straightforward.

   These vmas are not actually 'known' to user-space normally -
   the kernel PMU code knows about it and does what we do with
   PEBS: flushes it when necessary and puts it into the
   regular perf event channels.

   This solves the inherited perf record workflow immediately:
   the parent task just creates the buffer, which gets inherited 
   across exec() and fork(), into every portion of the workload.

   System-wide profiling is a small additional variant of this: 
   creating such a user-vmalloc() area for all tasks in the
   system so that the PMU code has them ready in the 
   context-switch code.

Solution #2 has the additional advantage that we could migrate 
PEBS to it and could allow interested user-space access to the 
'raw' PEBS buffer as well. (currently the PEBS buffer is only 
visible to kernel-space.)

I'd suggest the easy hack first, to get things going - we can 
then help out with the proper solution.

Thanks,

	Ingo

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ