linux-kernel - Re: [PATCH] perf record: Add snapshot mode support for perf's regular events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151126094057.GA7302@gmail.com>
Date:	Thu, 26 Nov 2015 10:40:57 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	"Wangnan (F)" <wangnan0@...wei.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Yunlong Song <yunlong.song@...wei.com>, paulus@...ba.org,
	mingo@...hat.com, acme@...nel.org, linux-kernel@...r.kernel.org,
	namhyung@...nel.org, ast@...nel.org,
	masami.hiramatsu.pt@...achi.com, kan.liang@...el.com,
	adrian.hunter@...el.com, jolsa@...nel.org, dsahern@...il.com,
	bp@...en8.de, jean.pihet@...aro.org, rric@...nel.org,
	xiakaixu@...wei.com, hekuang@...wei.com
Subject: Re: [PATCH] perf record: Add snapshot mode support for perf's
 regular events

* Ingo Molnar <mingo@...nel.org> wrote:

> 
> * Ingo Molnar <mingo@...nel.org> wrote:
> 
> > > But yes, we can do that userspace ring buffer when we really need it. At 
> > > very first we can start working on perf side and assume overwrite mode is 
> > > ready.
> > 
> > I don't think Peter asked for much: pick up the patch he has already written 
> > and use it, to have an even lower overhead always-enabled background tracing 
> > mode of perf.
> > 
> > Resizing shouldn't be much of an issue with existing features: if events start 
> > overflowing or some other threshold for dynamic increase of the ring-buffer is 
> > met then the daemon should open a new set of events with a larger ring-buffer, 
> > and close the old events once the new tracing ring-buffer is up and running.
> > 
> > Use event multiplexing to output all interesting events into the same single 
> > (per CPU) ring-buffer.
> 
> Btw., there's another trick we could use to support ftrace-alike workflows even 
> better: we could expose a task's active perf ring-buffers under /proc/<PID>/ and 
> could make it readable.
> 
> So if an overwrite-mode background tracing session is running, you don't even 
> have to signal it to capture the ring-buffer: just open the ring-buffer fd in 
> procfs, under /proc/XYZ/perf/ring-buffers/5.trace or so, and dump its current 
> contents, assuming the task doing that has sufficient permissions - i.e. 
> ptrace_may_access().
> 
> We could even pretty-print some very basic version of the records from the 
> kernel, via /proc/XYZ/perf/ring-buffers/5.txt, to support a tooling-less tracing 
> modes. This way perf based tracing could be supported even on systems that have 
> no writable filesystems.
> 
> I.e. in this regard perf can be made to match ftrace's tracing workflow as well 
> - in addition to the more traditional perf profiling workflow we all love and 
> know!

Also note that if we go in this direction then with some additional changes we 
could also support lightweight tracing with no tooling side at all on the traced 
system: a simple kernel feature with a kernel thread could be added that takes a 
list of events from sysfs or debugfs and opens them system-wide and exposes 
per-cpu overwrite mode ring-buffers.

Those ring-buffers can then be accessed via procfs (and/or also be exposed in 
parallel via debugfs). The kernel thread never actually does anything except set 
up the events - i.e. this is a very lightweight mode of always-on tracing.

Additional debugfs toggles can be added to temporarily turn tracing on/off without 
closing the events - just like ftrace.

Other toggles could be added, such as: 'stop tracing when the kernel has crashed, 
or if a specific event has occured or a condition has been met'.

That way we could, among other things, capture traces on embedded systems and copy 
the traces to another, larger system (or NFS-mount the target system), and run 
perf tooling to analyze the traces on that more powerful system.

But it all starts with making overwrite mode work well, and working with the 
kernel visible ring-buffer. That can then be exposed to user-space in very 
expressive ways to turn perf into a flexible system tracing subsystem as well.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/