linux-kernel - Re: [PATCH -next v3 1/2] perf stat: Support inherit events during fork() for bperf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3b45a6bc-f95a-4f87-b727-34ac7929c18b@huaweicloud.com>
Date: Fri, 11 Oct 2024 11:07:25 +0800
From: Tengda Wu <wutengda@...weicloud.com>
To: Namhyung Kim <namhyung@...nel.org>, Song Liu <song@...nel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
 Arnaldo Carvalho de Melo <acme@...nel.org>,
 Mark Rutland <mark.rutland@....com>,
 Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
 Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
 Adrian Hunter <adrian.hunter@...el.com>, kan.liang@...ux.intel.com,
 linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
 bpf@...r.kernel.org
Subject: Re: [PATCH -next v3 1/2] perf stat: Support inherit events during
 fork() for bperf



On 2024/10/10 12:53, Tengda Wu wrote:
> 
> 
> On 2024/10/10 8:31, Namhyung Kim wrote:
>> On Wed, Oct 09, 2024 at 10:18:44AM -0700, Song Liu wrote:
>>> On Sun, Sep 15, 2024 at 6:53 PM Tengda Wu <wutengda@...weicloud.com> wrote:
>>>>
>>>> bperf has a nice ability to share PMUs, but it still does not support
>>>> inherit events during fork(), resulting in some deviations in its stat
>>>> results compared with perf.
>>>>
>>>> perf stat result:
>>>> $ ./perf stat -e cycles,instructions -- ./perf test -w sqrtloop
>>>>
>>>>    Performance counter stats for './perf test -w sqrtloop':
>>>>
>>>>        2,316,038,116      cycles
>>>>        2,859,350,725      instructions
>>>>
>>>>          1.009603637 seconds time elapsed
>>>>
>>>>          1.004196000 seconds user
>>>>          0.003950000 seconds sys
>>>>
>>>> bperf stat result:
>>>> $ ./perf stat --bpf-counters -e cycles,instructions -- \
>>>>       ./perf test -w sqrtloop
>>>>
>>>>    Performance counter stats for './perf test -w sqrtloop':
>>>>
>>>>           18,762,093      cycles
>>>>           23,487,766      instructions
>>>>
>>>>          1.008913769 seconds time elapsed
>>>>
>>>>          1.003248000 seconds user
>>>>          0.004069000 seconds sys
>>>>
>>>> In order to support event inheritance, two new bpf programs are added
>>>> to monitor the fork and exit of tasks respectively. When a task is
>>>> created, add it to the filter map to enable counting, and reuse the
>>>> `accum_key` of its parent task to count together with the parent task.
>>>> When a task exits, remove it from the filter map to disable counting.
>>>>
>>>> After support:
>>>> $ ./perf stat --bpf-counters -e cycles,instructions -- \
>>>>       ./perf test -w sqrtloop
>>>>
>>>>  Performance counter stats for './perf test -w sqrtloop':
>>>>
>>>>      2,316,252,189      cycles
>>>>      2,859,946,547      instructions
>>>>
>>>>        1.009422314 seconds time elapsed
>>>>
>>>>        1.003597000 seconds user
>>>>        0.004270000 seconds sys
>>>>
>>>> Signed-off-by: Tengda Wu <wutengda@...weicloud.com>
>>>
>>> The solution looks good to me. Question on the UI: do we always
>>> want the inherit behavior from PID and TGID monitoring? If not,
>>> maybe we should add a flag for it. (I think we do need the flag).
>>
>> I think it should depend on the value of attr.inherit.  Maybe we can
>> disable the autoload for !inherit.
>>
> 
> Got it. The attr.inherit flag(related to --no-inherit in perf command)
> is suitable for controlling inherit behavior. I will fix it. Thanks!
> 
>>>
>>> One nitpick below.
>>>
>>> Thanks,
>>> Song
>>>
>>> [...]
>>>>
>>>> +struct bperf_filter_value {
>>>> +       __u32 accum_key;
>>>> +       __u8 exited;
>>>> +};
>>> nit:
>>> Can we use a special value of accum_key to replace exited==1
>>> case?
>>
>> I'm not sure.  I guess it still needs to use the accum_key to save the
>> final value when exited = 1.
> 
> In theory, it is possible. The accum_key is currently only used to index value
> in accum_readings map, so if the task is not being counted, the accum_key can
> be set to an special value.
> 
> Due to accum_key is of u32 type, there are two special values to choose from: 0
> or max_entries+1. I think the latter, max_entries+1, may be more suitable because
> it can avoid memory waste in the accum_readings map and does not require too
> many changes to bpf_counter.
> 

Sorry, I was wrong. As Namhyung said, 'accum_readings[accum_key]' saves the
last count of the task when it exits. If accum_key is set to a special value
at this time, the count will be lost.
 
So exited==1 is necessary, we can not use a special value of accum_key to
replace it.

Thanks,
Tengda

> 
>>
>> Thanks,
>> Namhyung
>>
>>>
>>>> +
>>>>  #endif /* __BPERF_STAT_U_H */
>>>> --
>>>> 2.34.1
>>>>
>