lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 24 Jul 2015 01:24:53 +0900
From:	Masami Hiramatsu <masami.hiramatsu.pt@...achi.com>
To:	Arnaldo Carvalho de Melo <acme@...nel.org>
CC:	Hemant Kumar <hemant@...ux.vnet.ibm.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	linux-kernel@...r.kernel.org,
	Adrian Hunter <adrian.hunter@...el.com>,
	Ingo Molnar <mingo@...hat.com>, Jiri Olsa <jolsa@...nel.org>,
	Namhyung Kim <namhyung@...nel.org>,
	Borislav Petkov <bp@...e.de>
Subject: Re: Re: Re: [RFC PATCH perf/core v2 00/16] perf-probe --cache
 and SDT support

On 2015/07/23 23:01, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jul 23, 2015 at 10:13:22PM +0900, Masami Hiramatsu escreveu:
>> On 2015/07/22 23:12, Hemant Kumar wrote:
>>> On 07/17/2015 08:51 AM, Masami Hiramatsu wrote:
>>>> On 2015/07/16 12:13, Hemant Kumar wrote:
>>> The idea behind '%' was to identify the SDT events and take a different path
>>> to lookup through the cache, put a probe, record and then delete the probe.
>>> Or, do you want "perf record" to record any event this way (not just an sdt
>>> event).
>  
>> I see, but I think that is not good by following reasons,
>  
>> - when we record event with "-e %provider:event", it will be shown as
>>   "provider:event"
>> - if perf-list shows the SDT(cached) events as "%provider:event", that
>>   will not match the recorded result.
>> - it is somewhat fragile that we temporary add the SDT event and remove it
>>   after record, because the event will not hide from ftrace users (this
>>   means that we'll fail removing the event by -EBUSY if someone use it
>>   via ftrace)
> 
> We should avoid that, if we say record event "foo:bar", then we should
> consistently show "foo:bar" everywhere this is referenced.
> 
>> - if we set SDT events perf-probe, it will be shown as "provider:event" name
>>   because "%" will be rejected by ftrace. In that case, what the perf-list show
>>   those events, both of %provider:event and provider:event ?
>  
>> thus I pushed the "%" as a "special remembering mark" only for looking
>> up the event from cache by perf-probe.
>  
>> So I'd like to suggest that the following behavior
>  
>> 1) perf-list shows the cached-with-name and SDT events as Tracepoint events
>>   even if it is not yet probed.
> 
> I can agree with 'perf list' showing what can be used as events, and
> SDTs, AFAIK, match that definition, i.e. we have somewhere (in the DSOs,
> right?) information about where to ask for an event to be enabled.

SDTs are described in .note section, and is extracted by perf-buildid-cache
and stored into perf-probe's cache file (under buildid-cache). Perf-list
shows only the extracted one.

> If there are details on how that needs to be obtained, then passed to
> the kernel somehow to then become really, really accessible, then these
> details are completely internal.

agreed.

> 
>> # perf list
>>
>> List of pre-defined events (to be used in -e):
>> ...
>>   libc:memory_heap_new                             [Tracepoint event]
> 
> Is it like this or is it like [ku]probes where we already have a
> namespace qualifier, i.e.:
> 
> [root@zoo ~]# perf probe icmp_rcv
> Added new event:
>   probe:icmp_rcv       (on icmp_rcv)
> 
> You can now use it in all perf tools, such as:
> 
> 	perf record -e probe:icmp_rcv -aR sleep 1
> 
> [root@zoo ~]#
> 
> [root@zoo ~]# perf probe /lib64/libc-2.20.so malloc
> Added new events:
>   probe_libc:malloc    (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_1  (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_2  (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_3  (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_4  (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_5  (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_6  (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_7  (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_8  (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_9  (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_10 (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_11 (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_12 (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_13 (on malloc in /lib64/libc-2.20.so)
>   probe_libc:malloc_14 (on malloc in /lib64/libc-2.20.so)
> 
> You can now use it in all perf tools, such as:
> 
> 	perf record -e probe_libc:malloc_14 -aR sleep 1
> 
> [root@zoo ~]#
> 
> "probe" for kernel events, "probe_%s" % DSO basename for userspace
> events.
> 
> Why not continue with that and have SDTs use the probe_%s: namespace?
> Sorry if this was already discussed here...

:) We are discussing about that in another thread, anyway, probe_%s can
solve a little part of the clash of names.

> 
> If there is some ambiguity, that can be resolved by explicitely setting
> a new name, 'perf probe' has provision for that, right? I.e.:

Yes, but that means we'll have to give new names before using that.

Actually, SDT has "provider-name", "event-name" and "probe location" (also
have arguments, but not supported). And provider name is not always same
as the binary name. (actually, the application developers can use any
name for it...)
So adding something special prefix or detect clash before using will
be the option.

The following patterns we've discussed.

 - <provider>:<name>
	simple, but could easily clash with others.
 - probe_<provider>:<name>
 - sdt_<provider>:<name>
	also simple and similar to current solution. but fragile against
	clash among SDTs.
 - probe_<binary>:<provider>_<name>
	also simple, but if provider or/and name has '_', it is hard to
	split the provider and name. and fragile against clash among SDTs too.
 - <provider>_<buildid>/<name>
	possible, but ugly since buildid is a random long xdigits(maybe cut up
	to 8 or 12 bytes).

> [root@zoo ~]# perf probe /lib64/libc-2.20.so another_name=malloc
> Added new events:
> <SNIP>
>   probe_libc:another_name_14 (on malloc in /lib64/libc-2.20.so)
> 
> You can now use it in all perf tools, such as:
> 
> 	perf record -e probe_libc:another_name_14 -aR sleep 1
> 
> [root@zoo ~]#
> 
>> ...
>>   probes:myevent                                   [Tracepoint event]
>> ...
>  
>> 2) perf-record -e with no-probed event should try to set up the given probe
>>  by using perf-probe. It is possible to remove that the probe after recording,
>>  but also ignore if it fails by -EBUSY. (anyway, there is no difference for
>>  users)
> 
> Right, being able to add new probes _without_ calling 'perf probe', but
> instead functions used by 'perf probe' is something the eBPF does (I'm
> almost getting there... :) ) and that I want to do in other tools, like
> 'trace' as well...

Yes, it should be done internally. (of course, users also can use perf-probe
for activating SDT.)


> There are permission problems about how to add new probes and how to
> _enable_ them, i.e. I think it is ok to allow users to ask for
> probe:foo, if they are monitoring just their workloads...

I see... But that's another issue I think, since the issue is already there
for using uprobes. We'd better fork thread for that issue.

[...]
> This needs refining, i.e. I should just warn that albeit "probe:vfs_getname" is available,
> it can't be used, unless we open the doors wide.
> 
> Sorry for the digression, but these permission issues will hit us with SDT as well, no?

Yes, and we have another way to avoid this issue, by using setuid as systemtap does.

>> This rule will solve the contradiction between the event name on recorded
>> data and listed events. However, as we discussed there are other clashes.
> 
>> A) clash among binaries: Since the binary builders can freely use the
>> provider name, it is possible to clash to other binaries' SDTs.
> 
> Yes, one way to disambiguate is to use the buildid, i.e. content based, when/if
> the need arises. We should _always_ store the build-id, together with any probe
> used, so that we can bail out when trying to run those with a non matching DSO
> (kernel, module, library, whatever).

Store the build-id with what? SDTs and cached probes are already stored under
build-id directory(same location of buildid-cache). Would you mean perf.data?

And managing all probe events strictly by perftools is hard since someone can
set new probes via tracefs(debugfs) directly.

> 
> When specifying some SDT that is ambiguous, we should bail out and ask for further
> namespacing, like:
> 
> [acme@zoo linux]$ git show --oneline 25b6
> error: short SHA1 25b6 is ambiguous.
> error: short SHA1 25b6 is ambiguous.
> fatal: ambiguous argument '25b6': unknown revision or path not in the working tree.
> Use '--' to separate paths from revisions, like this:
> 'git <command> [<revision>...] -- [<file>...]'
> [acme@zoo linux]$ git show --oneline 25b67
> error: short SHA1 25b67 is ambiguous.
> error: short SHA1 25b67 is ambiguous.
> fatal: ambiguous argument '25b67': unknown revision or path not in the working tree.
> Use '--' to separate paths from revisions, like this:
> 'git <command> [<revision>...] -- [<file>...]'
> 
> It can be via build-id or by asking for the full pathname to the DSO, etc.
>  
>> B) clash among different versions: Of course the different versions of binaries
>> can be co-exist on the system. Those usually have the same SDTs and same
>> basename, just different build-ids.
> 
> Right, in that case the build-id or the full pathname needs to be specified somehow
> 
>> These issues are not solved by using "%" because it happens among SDTs.
>> So we need to find another way to distinguish the SDTs.
> 
>>>From what I've read so far (not much): agreed.

Thank you,


-- 
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: masami.hiramatsu.pt@...achi.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ