lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <AM6PR03MB58482854EA66B98107FBEA2B99B12@AM6PR03MB5848.eurprd03.prod.outlook.com>
Date: Wed, 31 Jul 2024 14:53:59 +0100
From: Juntong Deng <juntong.deng@...look.com>
To: Kumar Kartikeya Dwivedi <memxor@...il.com>,
 Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: ast@...nel.org, daniel@...earbox.net, john.fastabend@...il.com,
 martin.lau@...ux.dev, eddyz87@...il.com, song@...nel.org,
 yonghong.song@...ux.dev, kpsingh@...nel.org, sdf@...ichev.me,
 haoluo@...gle.com, jolsa@...nel.org, andrii@...nel.org, avagin@...il.com,
 snorcht@...il.com, bpf@...r.kernel.org, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH bpf-next RESEND 00/16] bpf: Checkpoint/Restore In eBPF
 (CRIB)

On 7/23/24 01:49, Kumar Kartikeya Dwivedi wrote:
> On Tue, 23 Jul 2024 at 01:47, Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
>>
>> On Thu, Jul 11, 2024 at 12:10:17PM +0100, Juntong Deng wrote:
>>>
>>> In restore_udp_socket I had to add a struct bpf_crib_skb_info for
>>> restoring packets, this is because there is currently no BPF_CORE_WRITE.
>>>
>>> I am not sure what the current attitude of the kernel community
>>> towards BPF_CORE_WRITE is, personally I think it is well worth adding,
>>> as we need a portable way to change the value in the kernel.
>>>
>>> This not only allows more complexity in the CRIB restoring part to
>>> be transferred from CRIB kfuncs to CRIB ebpf programs, but also allows
>>> ebpf to unlock more possible application scenarios.
>>
>> There are lots of interesting ideas in this patch set, but it seems they are
>> doing the 'C-checkpoint' part of CRIx and something like BPF_CORE_WRITE
>> is necessary for 'R-restore'.
>> I'm afraid BPF_CORE_WRITE cannot be introduced without breaking all safety nets.
>> It will make bpf just as unsafe as any kernel module if bpf progs can start
>> writing into arbitrary kernel data structures. So it's a show stopper.
>> If you think there is a value in adding all these iterators for 'checkpoint'
>> part alone we can discuss and generalize individual patches.
> 
> I think it would be better to focus on the particular problem Juntong
> wants to solve, and go from there.
> That might help in cutting down the size of the patch set.
> It seems the main problem was restoring UDP sockets, but it got lost
> among all the other stuff.
> It's better to begin the discussion from there, which can still be
> rooted in what you believe CRIB in general is useful for.
> 
> Also, information is missing on what the previous attempts at solving
> this UDP problem were, and why they were insufficient such that BPF
> was necessary.
> What motivates the examples included as part of this set?
> I think this particular GSoC project is not new, so what were the
> limitations in previous attempts at restoring UDP sockets?

Yes, this idea originated from the GSoC task of dumping CORK-ed
UDP socket.

While I was solving this task I realized that ebpf has a much greater
potential to completely change the way we checkpoint/restore processes,
and can achieve better performance and more extensibility,
and that is CRIB.

For now, restoring CORK-ed UDP sockets is just one of the problems that
CRIB can solve, and it is not the main problem (that is, CRIB is not
designed around solving UDP problem).

(The difficulty with restoring CORK-ed UDP is that we do not have a
simple and elegant way to read back UDP packets in the write queue
before, but this is a simple task in CRIB.)

This is why I did not mention the GSoC task and the previous attempts
to solve the UDP problem in the patch set, because it is not the
same problem, and the previous solution to the UDP problem has nothing
to do with ebpf (CRIB).

But if adding this information would be useful, I can add it in the next
version of the patch set.

> Adding kfuncs makes it easier to checkpoint and restore state, but it
> also carries a maintenance cost.
> 
> Using BPF to speed up task state dump is going to be beneficial, but
> is an orthogonal problem (and doesn't have to be CRIU specific, the
> primitives that CRIU requires can be generic and used by others as
> well).
> 
> You're also skirting all kinds of compatibility concerns if you encode
> state to restore into structs, not getting into specifics, but if this
> pattern is followed, what happens on a kernel where say a particular
> field isn't available? It is a possibility that kfuncs may change
> their behavior due to kernel changes (not CRIB changes particularly),
> so how does user space respond to that? Sometimes, patches are
> backported, how does feature detection work?
> 
> What happens when the struct used to restore is grown to accomodate
> more state to restore? Kfuncs will have to detect the size of the
> structure and work with multiple versions (like what nf_conntrack_bpf
> kfuncs try to do with opts__sz).
> 

You are right, so CRIB needs BPF_CORE_READ and BPF_CORE_WRITE because we
need a portable way to read/write kernel structure values, and achieving
portability only through kfuncs would be a complex tough problem.

But since BPF_CORE_WRITE cannot be introduced, we put the restoration
part on hold and focus on dumping part first, which we can achieve
portability with BPF_CORE_READ.

> I tried to add io_uring and epoll iterators for capturing state
> (https://lore.kernel.org/bpf/20211201042333.2035153-1-memxor@gmail.com)
> a couple of years back, although I didn't have time to pursue it
> further after GSoC. But I tried to minimize the restoration interfaces
> exposed precisely because the above is hard to ensure. The more kfuncs
> you expose to restore state, the deeper the hole becomes, since it's
> meant to be a relatively user-friendly interface for CRIU to use, and
> work across different kernel versions.
> 

This was a great attempt!

I have always believed that checkpoint/restore via ebpf has great
potential (that is why I created CRIB).

If CRIB is successfully merged to the mainline, maybe we can retry
dumping io_uring and epoll via ebpf.

> Can the values passed through the struct to restore state be trusted?
> I'm not very well versed with the net/, but I think
> bpf_restore_skb_rcv_queue isn't doing much sanitization and taking in
> whatever was passed by the program. It would be helpful to explain why
> that is or is not ok.
> 

Of course it cannot be trusted, but since this is an RFC patch set,
I did not put too much effort into security checking (sanitization),
I mainly wanted to show the idea (proof of concept) and get feedback.

I will put more effort into security in subsequent versions of
the patch set.

> It's easier to review if we just focus on a particular problem. I
> think let's start with the UDP case, and then look at everything else
> later.
> 

Yes, it is always good to start with a particular problem.

I will focus next on solving the socket dump problem via CRIB and try to
integrate it into the CRIU project (in a personal branch).

If the above patch set is not too large, maybe I can also try to solve
one or two problems via CRIB that cannot be well dumped via procfs
in CRIU (poor performance or incomplete information).

Anyway, I will keep the next version of the patch set small and easy
to review.

>>
>> High level feedback:
>>
>> - no need for BPF_PROG_TYPE_CRIB program type. Existing syscall type should fit.
>>
> 
> +1
> 
>> - proposed file/socket iterators are somewhat unnecessary in this open coded form.
>>    there is already file/socket iterator. From the selftests it looks like it
>>    can be used to do 'checkpoint' part already.
> 
> +1
> 
>>
>> - KF_ITER_GETTER is a good addition, but we should be able to do it without these flags.
>>    kfunc-s should be able to accept iterator as an argument. Some __suffix annotation
>>    may be necessary to help verifier if BTF type alone of the argument won't be enough.
>>
>> - KF_OBTAIN looks like a broken hammer to bypass safety. Like:
>>
>>    > Currently we cannot pass the pointer returned by the iterator next
>>    > method as argument to the KF_TRUSTED_ARGS kfuncs, because the pointer
>>    > returned by the iterator next method is not "valid".
> 
> I've replied to this particular patch to explain what exact unsafety
> it might introduce.
> I also think the 2nd use case might be fixed by a recent patch.
> 
> [...]


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ