[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID:
<AM6PR03MB5080392CC36DB66E8EA202DE99F42@AM6PR03MB5080.eurprd03.prod.outlook.com>
Date: Tue, 4 Feb 2025 23:35:08 +0000
From: Juntong Deng <juntong.deng@...look.com>
To: ast@...nel.org, daniel@...earbox.net, john.fastabend@...il.com,
andrii@...nel.org, martin.lau@...ux.dev, eddyz87@...il.com, song@...nel.org,
yonghong.song@...ux.dev, kpsingh@...nel.org, sdf@...ichev.me,
haoluo@...gle.com, jolsa@...nel.org, memxor@...il.com, snorcht@...il.com,
brauner@...nel.org
Cc: bpf@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: [RFC] bpf: Rethinking BPF safety, BPF open-coded iterators, and
possible improvements (runtime protection)
This discussion comes from the patch series open-coded BPF file
iterator, which was Nack-ed and thus ended [0].
Thanks for the feedback from Christian, Linus, and Al, all very helpful.
The problems encountered in this patch series may also be encountered in
other BPF open-coded iterators to be added in the future, or in other
BPF usage scenarios.
So maybe this is a good opportunity for us to discuss all of this and
rethink BPF safety, BPF open coded iterators, and possible improvements.
[0]:
https://lore.kernel.org/bpf/AM6PR03MB50801990BD93BFA2297A123599EC2@AM6PR03MB5080.eurprd03.prod.outlook.com/T/#t
What do we expect from BPF safety?
----------------------------------
Christian points out the important fact that BPF programs can hold
references for a long time and cause weird issues.
This is an inherent flaw in BPF. Since the addition of bpf_loop and
BPF open-code iterators, the myth that BPF is "absolutely" safe has
been broken.
The BPF verifier is a static verifier and has no way of knowing how
long a BPF program will actually run.
For example, the following BPF program can freeze your computer, but
can pass the BPF verifier smoothly.
SEC("raw_tp/sched_switch")
int BPF_PROG(on_switch)
{
struct bpf_iter_num it;
int *v;
bpf_iter_num_new(&it, 0, 100000);
while ((v = bpf_iter_num_next(&it))) {
struct bpf_iter_num it2;
bpf_iter_num_new(&it2, 0, 100000);
while ((v = bpf_iter_num_next(&it2))) {
bpf_printk("BPF Bomb\n");
}
bpf_iter_num_destroy(&it2);
}
bpf_iter_num_destroy(&it);
return 0;
}
This BPF program runs a huge loop at each schedule.
bpf_iter_num_new is a common iterator that we can use in almost any
context, including LSM, sched-ext, tracing, etc.
We can run large, long loops on any critical code path and freeze the
system, since the BPF verifier has no way of knowing how long the
iteration will run.
The BPF verifier can guarantee that the BPF program will end, but this
end time may be in the very distant future, which may cause the system
to freeze, hold references (KF_ACQUIRE) for a long time, and hold locks
(bpf_spin_lock) for a long time.
If we still believe that BPF can be used safely in scenarios such as
LSM, sched-ext, tracing, etc., users are sensible and will not write
long-running dangerous BPF programs to get themselves into trouble,
and BPF programs can exit in a limited short time.
Then holding references or holding locks in BPF programs doesn't seem
to be a problem?
But this is obviously an idealistic assumption and we have to face the
worst-case scenario.
This brings us back to the question at the beginning, what do we expect
from BPF safety?
If we need stronger safety, then we may need something beyond static
(possible improvements section).
What do we expect from BPF and BPF open coded iterators?
--------------------------------------------------------
Would we expect BPF programs to have flexible access to more information
in the kernel?
Would we expect to have more BPF open-coded iterators allowing BPF
programs to iterate through various data structures in the kernel?
What are the boundaries of what we expect BPF to be able to do?
If we expect BPF programs to access more objects in the kernel, then
temporarily holding object references is almost inevitable, whether
during iteration or not.
Of course, there may be risks, but maybe those risks can be solved by
improving BPF?
I have always believed that BPF has the potential to obtain information
in the kernel in a flexible, scalable, safe, and portable way.
Possible improvements (runtime protection)
------------------------------------------
The inability of the BPF verifier to detect long running loops perhaps
indicates the limitations of the purely static checking approach, and
determining the running time of dynamically sized iterations by static
checking seems impossible.
We may need some additional dynamic (runtime) protection mechanisms.
Maybe a possible improvement would be to implement a simple runtime held
resources record. We pair acquire and release, lock and unlock kfuncs,
and insert some code in these functions (perhaps via macros) to record
runtime state information (such as the type of reference/lock, and the
memory address of the corresponding object/lock).
After the BPF program runs out of time, the program is forced to be
killed and resources held are automatically released based on the
reference/lock type and address to avoid impact on the overall system,
and return the error to the BPF userspace program.
The timeout can be different in different contexts or holding different
references/locks (For example, in a tracing context the timeout may be
very short).
I am not sure that adding runtime protection is worth it, since the
efforts of BPF so far have been to do all the safety checking before the
program runs, but static checking cannot solve all the problems.
If the BPF community is interested in this, maybe we can try
something new.
At the end
----------
Any discussion is welcome.
I hope this discussion helps us clarify what we can expect from BPF in
the future, and what flaws BPF currently has.
This may help us more easily decide which features should be added to
BPF and which should not, or what improvements should be made to BPF
to help us apply BPF to more scenarios in the future.
Many thanks.
Powered by blists - more mailing lists