netdev - Re: [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a1905aeb-b49f-d4e8-91ee-a28a92869da1@fb.com>
Date:   Fri, 1 Sep 2017 13:29:17 -0700
From:   Alexei Starovoitov <ast@...com>
To:     Yonghong Song <yhs@...com>, <peterz@...radead.org>,
        <rostedt@...dmis.org>, <daniel@...earbox.net>,
        <netdev@...r.kernel.org>
CC:     <kernel-team@...com>
Subject: Re: [PATCH net-next 1/4] bpf: add helper bpf_perf_read_counter_time
 for perf event array map

On 9/1/17 9:53 AM, Yonghong Song wrote:
> Hardware pmu counters are limited resources. When there are more
> pmu based perf events opened than available counters, kernel will
> multiplex these events so each event gets certain percentage
> (but not 100%) of the pmu time. In case that multiplexing happens,
> the number of samples or counter value will not reflect the
> case compared to no multiplexing. This makes comparison between
> different runs difficult.
>
> Typically, the number of samples or counter value should be
> normalized before comparing to other experiments. The typical
> normalization is done like:
>   normalized_num_samples = num_samples * time_enabled / time_running
>   normalized_counter_value = counter_value * time_enabled / time_running
> where time_enabled is the time enabled for event and time_running is
> the time running for event since last normalization.
>
> This patch adds helper bpf_perf_read_counter_time for kprobed based perf
> event array map, to read perf counter and enabled/running time.
> The enabled/running time is accumulated since the perf event open.
> To achieve scaling factor between two bpf invocations, users
> can can use cpu_id as the key (which is typical for perf array usage model)
> to remember the previous value and do the calculation inside the
> bpf program.
>
> Signed-off-by: Yonghong Song <yhs@...com>

...

> +BPF_CALL_4(bpf_perf_read_counter_time, struct bpf_map *, map, u64, flags,
> +	struct bpf_perf_counter_time *, buf, u32, size)
> +{
> +	struct perf_event *pe;
> +	u64 now;
> +	int err;
> +
> +	if (unlikely(size != sizeof(struct bpf_perf_counter_time)))
> +		return -EINVAL;
> +	err = get_map_perf_counter(map, flags, &buf->counter, &pe);
> +	if (err)
> +		return err;
> +
> +	calc_timer_values(pe, &now, &buf->time.enabled, &buf->time.running);
> +	return 0;
> +}

Peter,
I believe we're doing it correctly above.
It's a copy paste of the same logic as in total_time_enabled/running.
We cannot expose total_time_enabled/running to bpf, since they are
different counters. The above two are specific to bpf usage.
See commit log.

for the whole set:
Acked-by: Alexei Starovoitov <ast@...nel.org>