linux-kernel - Re: [PATCH -next v4 1/2] perf stat: Support inherit events during fork() for bperf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <093e14a9-4008-4490-9946-5080449935c4@huaweicloud.com>
Date: Sat, 19 Oct 2024 16:27:38 +0800
From: Tengda Wu <wutengda@...weicloud.com>
To: Namhyung Kim <namhyung@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, song@...nel.org,
 Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo <acme@...nel.org>,
 Mark Rutland <mark.rutland@....com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
 Adrian Hunter <adrian.hunter@...el.com>, kan.liang@...ux.intel.com,
 linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
 bpf@...r.kernel.org
Subject: Re: [PATCH -next v4 1/2] perf stat: Support inherit events during
 fork() for bperf



On 2024/10/19 1:12, Namhyung Kim wrote:
> On Thu, Oct 17, 2024 at 10:53:46AM +0800, Tengda Wu wrote:
>> Hi,
>>
>> On 2024/10/17 5:16, Namhyung Kim wrote:
>>> Hello,
>>>
>>> On Sat, Oct 12, 2024 at 02:32:24AM +0000, Tengda Wu wrote:
>>>> bperf has a nice ability to share PMUs, but it still does not support
>>>> inherit events during fork(), resulting in some deviations in its stat
>>>> results compared with perf.
>>>>
>>>> perf stat result:
>>>> $ ./perf stat -e cycles,instructions -- ./perf test -w sqrtloop
>>>>
>>>>    Performance counter stats for './perf test -w sqrtloop':
>>>>
>>>>        2,316,038,116      cycles
>>>>        2,859,350,725      instructions
>>>>
>>>>          1.009603637 seconds time elapsed
>>>>
>>>>          1.004196000 seconds user
>>>>          0.003950000 seconds sys
>>>>
>>>> bperf stat result:
>>>> $ ./perf stat --bpf-counters -e cycles,instructions -- \
>>>>       ./perf test -w sqrtloop
>>>>
>>>>    Performance counter stats for './perf test -w sqrtloop':
>>>>
>>>>           18,762,093      cycles
>>>>           23,487,766      instructions
>>>>
>>>>          1.008913769 seconds time elapsed
>>>>
>>>>          1.003248000 seconds user
>>>>          0.004069000 seconds sys
>>>>
>>>> In order to support event inheritance, two new bpf programs are added
>>>> to monitor the fork and exit of tasks respectively. When a task is
>>>> created, add it to the filter map to enable counting, and reuse the
>>>> `accum_key` of its parent task to count together with the parent task.
>>>> When a task exits, remove it from the filter map to disable counting.
>>>>
>>>> After support:
>>>> $ ./perf stat --bpf-counters -e cycles,instructions -- \
>>>>       ./perf test -w sqrtloop
>>>>
>>>>  Performance counter stats for './perf test -w sqrtloop':
>>>>
>>>>      2,316,252,189      cycles
>>>>      2,859,946,547      instructions
>>>>
>>>>        1.009422314 seconds time elapsed
>>>>
>>>>        1.003597000 seconds user
>>>>        0.004270000 seconds sys
>>>>
>>>> Signed-off-by: Tengda Wu <wutengda@...weicloud.com>
>>>> ---
>>>>  tools/perf/builtin-stat.c                     |  4 +-
>>>>  tools/perf/util/bpf_counter.c                 | 57 +++++++++---
>>>>  tools/perf/util/bpf_counter.h                 | 13 ++-
>>>>  tools/perf/util/bpf_counter_cgroup.c          |  3 +-
>>>>  tools/perf/util/bpf_skel/bperf_follower.bpf.c | 87 +++++++++++++++++--
>>>>  tools/perf/util/bpf_skel/bperf_u.h            |  5 ++
>>>>  6 files changed, 145 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
>>>> index 3e6b9f216e80..c27b107c1985 100644
>>>> --- a/tools/perf/builtin-stat.c
>>>> +++ b/tools/perf/builtin-stat.c
>>>> @@ -698,6 +698,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>>>>  	char msg[BUFSIZ];
>>>>  	unsigned long long t0, t1;
>>>>  	struct evsel *counter;
>>>> +	struct bpf_stat_opts opts;
>>>>  	size_t l;
>>>>  	int status = 0;
>>>>  	const bool forks = (argc > 0);
>>>> @@ -725,7 +726,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>>>>  
>>>>  	evlist__for_each_entry(evsel_list, counter) {
>>>>  		counter->reset_group = false;
>>>> -		if (bpf_counter__load(counter, &target)) {
>>>> +		opts.inherit = !stat_config.no_inherit;
>>>> +		if (bpf_counter__load(counter, &target, &opts)) {
>>>
>>> Maybe you can just add a boolean member in the struct target.
>>
>> Yes，this approach would be more straightforward. 
>>
>> I had considered it before, but, as you see, considering that `inherit` does not
>> align well with the `target` semantics, I chose the another one.
> 
> Well, I think 'inherit' is well aligned with the target semantics.
> We want some processes as the targets of the event and we want to
> profile their children or not.
> 

Ok.

>>
>> Anyway, I'll try it. Code changes would be more clean. Thanks.
>>
>>>
>>>
>>>>  			err = -1;
>>>>  			goto err_out;
>>>>  		}
>>>> diff --git a/tools/perf/util/bpf_counter.c b/tools/perf/util/bpf_counter.c
>>>> index 7a8af60e0f51..00afea6bde63 100644
>>>> --- a/tools/perf/util/bpf_counter.c
>>>> +++ b/tools/perf/util/bpf_counter.c
>>>> @@ -166,7 +166,9 @@ static int bpf_program_profiler_load_one(struct evsel *evsel, u32 prog_id)
>>>>  	return -1;
>>>>  }
>>>>  
>>>> -static int bpf_program_profiler__load(struct evsel *evsel, struct target *target)
>>>> +static int bpf_program_profiler__load(struct evsel *evsel,
>>>> +				      struct target *target,
>>>> +				      struct bpf_stat_opts *opts __maybe_unused)
>>>>  {
>>>>  	char *bpf_str, *bpf_str_, *tok, *saveptr = NULL, *p;
>>>>  	u32 prog_id;
>>>> @@ -364,6 +366,7 @@ static int bperf_lock_attr_map(struct target *target)
>>>>  
>>>>  static int bperf_check_target(struct evsel *evsel,
>>>>  			      struct target *target,
>>>> +			      struct bpf_stat_opts *opts,
>>>>  			      enum bperf_filter_type *filter_type,
>>>>  			      __u32 *filter_entry_cnt)
>>>>  {
>>>> @@ -383,7 +386,12 @@ static int bperf_check_target(struct evsel *evsel,
>>>>  		*filter_type = BPERF_FILTER_PID;
>>>>  		*filter_entry_cnt = perf_thread_map__nr(evsel->core.threads);
>>>>  	} else if (target->pid || evsel->evlist->workload.pid != -1) {
>>>> -		*filter_type = BPERF_FILTER_TGID;
>>>> +		/*
>>>> +		 * unlike the PID type, the TGID type implicitly enables
>>>> +		 * event inheritance within a single process.
>>>> +		 */
>>>> +		*filter_type = opts->inherit ?
>>>> +				BPERF_FILTER_TGID : BPERF_FILTER_PID;
>>>
>>> I'm not sure if it's right.  You should be able to use PID type with
>>> inheritance.  In this case child processes or threads from the selected
>>> thread would be counted only.
>>
>> Sorry, don't quite understand. TGID type counts together with all sub-threads within
>> the same process, which is what inheritance needs to do; while PID type only counts
>> for a single thread and should be used when inheritance is turned off. This is equivalent
>> to the code above.
> 
> Let me be clear:
> 
>  * PID w/o inherit : specified threads only
>  * PID w/ inherit  : specified threads + all threads or child process from the threads
>  * TGID w/o inherit: specified process (all threads in the process) only
>  * TGID w/ inherit : specified process + all children from the processes
> 
> For the TGID w/o inherit case, it's ok not to track new threads in the
> process because they will have the same tgid anyway.
> 
> So you cannot change the filter type using inherit value.  It should be
> used to control whether it tracks new task only.
> 

So changing 'TGID w/o inherit' to 'PID w/o inherit' will lose counts of all
threads in the process, right?

It's clear now. Thanks for the explanation.

Tengda