lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 11 Jan 2021 19:24:23 -0800 From: Yonghong Song <yhs@...com> To: Song Liu <songliubraving@...com> CC: bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "mingo@...hat.com" <mingo@...hat.com>, "peterz@...radead.org" <peterz@...radead.org>, "ast@...nel.org" <ast@...nel.org>, "daniel@...earbox.net" <daniel@...earbox.net>, "andrii@...nel.org" <andrii@...nel.org>, "john.fastabend@...il.com" <john.fastabend@...il.com>, "kpsingh@...omium.org" <kpsingh@...omium.org>, Kernel Team <Kernel-team@...com>, "haoluo@...gle.com" <haoluo@...gle.com> Subject: Re: [PATCH bpf-next 4/4] bpf: runqslower: use task local storage On 1/11/21 2:54 PM, Song Liu wrote: > > >> On Jan 11, 2021, at 9:49 AM, Yonghong Song <yhs@...com> wrote: >> >> >> >> On 1/8/21 3:19 PM, Song Liu wrote: >>> Replace hashtab with task local storage in runqslower. This improves the >>> performance of these BPF programs. The following table summarizes average >>> runtime of these programs, in nanoseconds: >>> task-local hash-prealloc hash-no-prealloc >>> handle__sched_wakeup 125 340 3124 >>> handle__sched_wakeup_new 2812 1510 2998 >>> handle__sched_switch 151 208 991 >>> Note that, task local storage gives better performance than hashtab for >>> handle__sched_wakeup and handle__sched_switch. On the other hand, for >>> handle__sched_wakeup_new, task local storage is slower than hashtab with >>> prealloc. This is because handle__sched_wakeup_new accesses the data for >>> the first time, so it has to allocate the data for task local storage. >>> Once the initial allocation is done, subsequent accesses, as those in >>> handle__sched_wakeup, are much faster with task local storage. If we >>> disable hashtab prealloc, task local storage is much faster for all 3 >>> functions. >>> Signed-off-by: Song Liu <songliubraving@...com> >>> --- >>> tools/bpf/runqslower/runqslower.bpf.c | 26 +++++++++++++++----------- >>> 1 file changed, 15 insertions(+), 11 deletions(-) >>> diff --git a/tools/bpf/runqslower/runqslower.bpf.c b/tools/bpf/runqslower/runqslower.bpf.c >>> index 1f18a409f0443..c4de4179a0a17 100644 >>> --- a/tools/bpf/runqslower/runqslower.bpf.c >>> +++ b/tools/bpf/runqslower/runqslower.bpf.c >>> @@ -11,9 +11,9 @@ const volatile __u64 min_us = 0; >>> const volatile pid_t targ_pid = 0; >>> struct { >>> - __uint(type, BPF_MAP_TYPE_HASH); >>> - __uint(max_entries, 10240); >>> - __type(key, u32); >>> + __uint(type, BPF_MAP_TYPE_TASK_STORAGE); >>> + __uint(map_flags, BPF_F_NO_PREALLOC); >>> + __type(key, int); >>> __type(value, u64); >>> } start SEC(".maps"); >>> @@ -25,15 +25,19 @@ struct { >>> /* record enqueue timestamp */ >>> __always_inline >>> -static int trace_enqueue(u32 tgid, u32 pid) >>> +static int trace_enqueue(struct task_struct *t) >>> { >>> - u64 ts; >>> + u32 pid = t->pid; >>> + u64 ts, *ptr; >>> if (!pid || (targ_pid && targ_pid != pid)) >>> return 0; >>> ts = bpf_ktime_get_ns(); >>> - bpf_map_update_elem(&start, &pid, &ts, 0); >>> + ptr = bpf_task_storage_get(&start, t, 0, >>> + BPF_LOCAL_STORAGE_GET_F_CREATE); >>> + if (ptr) >>> + *ptr = ts; >>> return 0; >>> } >>> @@ -43,7 +47,7 @@ int handle__sched_wakeup(u64 *ctx) >>> /* TP_PROTO(struct task_struct *p) */ >>> struct task_struct *p = (void *)ctx[0]; >>> - return trace_enqueue(p->tgid, p->pid); >>> + return trace_enqueue(p); >>> } >>> SEC("tp_btf/sched_wakeup_new") >>> @@ -52,7 +56,7 @@ int handle__sched_wakeup_new(u64 *ctx) >>> /* TP_PROTO(struct task_struct *p) */ >>> struct task_struct *p = (void *)ctx[0]; >>> - return trace_enqueue(p->tgid, p->pid); >>> + return trace_enqueue(p); >>> } >>> SEC("tp_btf/sched_switch") >>> @@ -70,12 +74,12 @@ int handle__sched_switch(u64 *ctx) >>> /* ivcsw: treat like an enqueue event and store timestamp */ >>> if (prev->state == TASK_RUNNING) >>> - trace_enqueue(prev->tgid, prev->pid); >>> + trace_enqueue(prev); >>> pid = next->pid; >>> /* fetch timestamp and calculate delta */ >>> - tsp = bpf_map_lookup_elem(&start, &pid); >>> + tsp = bpf_task_storage_get(&start, next, 0, 0); >>> if (!tsp) >>> return 0; /* missed enqueue */ >> >> Previously, hash table may overflow so we may have missed enqueue. >> Here with task local storage, is it possible to add additional pid >> filtering in the beginning of handle__sched_switch such that >> missed enqueue here can be treated as an error? > > IIUC, hashtab overflow is not the only reason of missed enqueue. If the > wakeup (which calls trace_enqueue) happens before runqslower starts, we > may still get missed enqueue in sched_switch, no? the wakeup won't happen before runqslower starts since runqslower needs to start to do attachment first and then trace_enqueue() can run. For the current implementation trace_enqueue() will happen for any non-0 pid before setting test_progs tgid, and will happen for any non-0 and test_progs tgid if it is set, so this should be okay if we do filtering in handle__sched_switch. Maybe you can do an experiment to prove whether my point is correct or not. > > Thanks, > Song >
Powered by blists - more mailing lists