linux-kernel - Re: [RFC PATCH bpf-next 4/6] bpf: Add bpf runtime hooks for tracking runtime acquire/release

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAADnVQ+cokog6j5RjO7qNwBWswXTbu-x2j4EoQEt405-2i5jXw@mail.gmail.com>
Date: Thu, 27 Feb 2025 19:34:14 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Juntong Deng <juntong.deng@...look.com>
Cc: Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, 
	John Fastabend <john.fastabend@...il.com>, Andrii Nakryiko <andrii@...nel.org>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Eddy Z <eddyz87@...il.com>, Song Liu <song@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>, 
	Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	Kumar Kartikeya Dwivedi <memxor@...il.com>, snorcht@...il.com, bpf <bpf@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH bpf-next 4/6] bpf: Add bpf runtime hooks for tracking
 runtime acquire/release

On Thu, Feb 27, 2025 at 1:55 PM Juntong Deng <juntong.deng@...look.com> wrote:
>
> I have an idea, though not sure if it is helpful.
>
> (This idea is for the previous problem of holding references too long)
>
> My idea is to add a new KF_FLAG, like KF_ACQUIRE_EPHEMERAL, as a
> special reference that can only be held for a short time.
>
> When a bpf program holds such a reference, the bpf program will not be
> allowed to enter any new logic with uncertain runtime, such as bpf_loop
> and the bpf open coded iterator.
>
> (If the bpf program is already in a loop, then no problem, as long as
> the bpf program doesn't enter a new nested loop, since the bpf verifier
> guarantees that references must be released in the loop body)
>
> In addition, such references can only be acquired and released between a
> limited number of instructions, e.g., 300 instructions.

Not much can be done with few instructions.
Number of insns is a coarse indicator of time. If there are calls
they can take a non-trivial amount of time.
People didn't like CRIB as a concept. Holding a _regular_ file refcnt for
the duration of the program is not a problem.
Holding special files might be, since they're not supposed to be held.
Like, is it safe to get_file() userfaultfd ? It needs in-depth
analysis and your patch didn't provide any confidence that
such analysis was done.

Speaking of more in-depth analysis of the problem.
In the cover letter you mentioned bpf_throw and exceptions as
one of the way to terminate the program, but there was another
proposal:
https://lpc.events/event/17/contributions/1610/

aka accelerated execution or fast-execute.
After the talk at LPC there were more discussions and follow ups.

Roughly the idea is the following,
during verification determine all kfuncs, helpers that
can be "speed up" and replace them with faster alternatives.
Like bpf_map_lookup_elem() can return NULL in the fast-execution version.
All KF_ACQUIRE | KF_RET_NULL can return NULL to.
bpf_loop() can end sooner.
bpf_*_iter_next() can return NULL,
etc

Then at verification time create such a fast-execute
version of the program with 1-1 mapping of IPs / instructions.
When a prog needs to be cancelled replace return IP
to IP in fast-execute version.
Since all regs are the same, continuing in the fast-execute
version will release all currently held resources
and no need to have either run-time (like this patch set)
or exception style (resource descriptor collection of resources)
bookkeeping to release.
The program itself is going to release whatever it acquired.
bpf_throw does manual stack unwind right now.
No need for that either. Fast-execute will return back
all the way to the kernel hook via normal execution path.

Instead of patching return IP in the stack,
we can text_poke_bp() the code of the original bpf prog to
jump to the fast-execute version at corresponding IP/insn.

The key insight is that cancellation doesn't mean
that the prog stops completely. It continues, but with
an intent to finish as quickly as possible.
In practice it might be faster to do that
than walk your acquired hash table and call destructors.

Another important bit is that control flow is unchanged.
Introducing new edge in a graph is tricky and error prone.

All details need to be figured out, but so far it looks
to be the cleanest and least intrusive solution to program
cancellation.
Would you be interested in helping us design/implement it?