[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <333ffa5f-5875-3870-4933-1ee23b68fb87@fb.com>
Date: Mon, 12 Jul 2021 08:49:19 -0700
From: Yonghong Song <yhs@...com>
To: Andrii Nakryiko <andrii@...nel.org>, <bpf@...r.kernel.org>,
<netdev@...r.kernel.org>, <ast@...com>, <daniel@...earbox.net>,
<linux-kernel@...r.kernel.org>
CC: <kernel-team@...com>
Subject: Re: [PATCH bpf-next] bpf: add ambient BPF runtime context stored in
current
On 7/9/21 6:11 PM, Andrii Nakryiko wrote:
> b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
> helper") fixed the problem with cgroup-local storage use in BPF by
> pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate
> possible BPF program preemptions and nested executions.
>
> While this seems to work good in practice, it introduces new and unnecessary
> failure mode in which not all BPF programs might be executed if we fail to
> find an unused slot for cgroup storage, however unlikely it is. It might also
> not be so unlikely when/if we allow sleepable cgroup BPF programs in the
> future.
>
> Further, the way that cgroup storage is implemented as ambiently-available
> property during entire BPF program execution is a convenient way to pass extra
> information to BPF program and helpers without requiring user code to pass
> around extra arguments explicitly. So it would be good to have a generic
> solution that can allow implementing this without arbitrary restrictions.
> Ideally, such solution would work for both preemptable and sleepable BPF
> programs in exactly the same way.
>
> This patch introduces such solution, bpf_run_ctx. It adds one pointer field
> (bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of
> macros in such a way that it always stays valid throughout BPF program
> execution. BPF program preemption is handled by remembering previous
> current->bpf_ctx value locally while executing nested BPF program and
> restoring old value after nested BPF program finishes. This is handled by two
> helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are
> supposed to be used before and after BPF program runs, respectively.
>
> Restoring old value of the pointer handles preemption, while bpf_run_ctx
> pointer being a property of current task_struct naturally solves this problem
> for sleepable BPF programs by "following" BPF program execution as it is
> scheduled in and out of CPU. It would even allow CPU migration of BPF
> programs, even though it's not currently allowed by BPF infra.
>
> This patch cleans up cgroup local storage handling as a first application. The
> design itself is generic, though, with bpf_run_ctx being an empty struct that
> is supposed to be embedded into a specific struct for a given BPF program type
> (bpf_cg_run_ctx in this case). Follow up patches are planned that will expand
> this mechanism for other uses within tracing BPF programs.
>
> To verify that this change doesn't revert the fix to the original cgroup
> storage issue, I ran the same repro as in the original report ([0]) and didn't
> get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with
> bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work).
>
> [0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/
>
> Cc: Yonghong Song <yhs@...com>
> Fixes: b910eaaaa4b8 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
> Signed-off-by: Andrii Nakryiko <andrii@...nel.org>
Except a config related issue reported by kernel test robot, the
patch looks good to me.
Acked-by: Yonghong Song <yhs@...com>
Powered by blists - more mailing lists