[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200416231829.o4yngurm5nzrakoj@ast-mbp.dhcp.thefacebook.com>
Date: Thu, 16 Apr 2020 16:18:29 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: Yonghong Song <yhs@...com>, Andrii Nakryiko <andriin@...com>,
bpf <bpf@...r.kernel.org>, Martin KaFai Lau <kafai@...com>,
Networking <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...com>,
Daniel Borkmann <daniel@...earbox.net>,
Kernel Team <kernel-team@...com>
Subject: Re: [RFC PATCH bpf-next 05/16] bpf: create file or anonymous dumpers
On Thu, Apr 16, 2020 at 12:35:07PM -0700, Andrii Nakryiko wrote:
> >
> > I slept on it and still fundamentally disagree that seq_file + bpf_prog
> > is a derivative of link. Or in OoO terms it's not a child class of bpf_link.
> > seq_file is its own class that should contain bpf_link as one of its
> > members, but it shouldn't be derived from 'class bpf_link'.
>
> Referring to inheritance here doesn't seem necessary or helpful, I'd
> rather not confuse and complicate all this further.
>
> bpfdump provider/target + bpf_prog = bpf_link. bpf_link is "a factory"
> of seq_files. That's it, no inheritance.
named seq_file in bpfdumpfs does indeed look like "factory" pattern.
And yes, there is no inheritance between named seq_file and given seq_file after open().
> > In that sense Yonghong proposed api (raw_tp_open to create anon seq_file+prog
> > and obj_pin to create a template of named seq_file+prog) are the best fit.
> > Implementation wise his 'struct extra_priv_data' needs to include
> > 'struct bpf_link' instead of 'struct bpf_prog *prog;' directly.
> >
> > So evertime 'cat' opens named seq_file there is bpf_link registered in IDR.
> > Anon seq_file should have another bpf_link as well.
>
> So that's where I disagree and don't see the point of having all those
> short-lived bpf_links. cat opening seq_file doesn't create a bpf_link,
> it creates a seq_file. If we want to associate some ID with it, it's
> fine, but it's not a bpf_link ID (in my opinion, of course).
I thought we're on the same page with the definition of bpf_link ;)
Let's recap. To make it easier I'll keep using object oriented analogy
since I think it's the most appropriate to internalize all the concepts.
- first what is file descriptor? It's nothing but std::shared_ptr<> to some kernel object.
- then there is a key class == struct bpf_link
- for raw tracepoints raw_tp_open() returns an FD to child class of bpf_link
which is 'struct bpf_raw_tp_link'.
In other words it returns std::shared_ptr<struct bpf_raw_tp_link>.
- for fentry/fexit/freplace/lsm raw_tp_open() returns an FD to a different child
class of bpf_link which is "struct bpf_tracing_link".
This is std::share_ptr<struct bpf_trace_link>.
- for cgroup-bpf progs bpf_link_create() returns an FD to child class of bpf_link
which is 'struct bpf_cgroup_link'.
This is std::share_ptr<struct bpf_cgroup_link>.
In all those cases three different shared pointers are seen as file descriptors
from the process pov but they point to different children of bpf_link base class.
link_update() is a method of base class bpf_link and it has to work for
all children classes.
Similarly your future get_obj_info_by_fd() from any of these three shared pointers
will return information specific to that child class.
In all those cases one link attaches one program to one kernel object.
Now back to bpfdumpfs.
In the latest Yonghong's patches raw_tp_open() returns an FD that is a pointer
to seq_file. This is existing kernel base class. It has its own seq_operations
virtual methods that are defined for bpfdumpfs_seq_file which is a child class
of seq_file that keeps start/stop/next methods as-is and overrides show()
method to be able to call bpf prog for every iteratable kernel object.
What you're proposing is to make bpfdump_seq_file class to be a child of two
base classes (seq_file and bpf_link) whereas I'm saying that it should be
a child of seq_file only, since bpf_link methods do not apply to it.
Like there is no sensible behavior for link_update() on such dual parent object.
In my proposal bpfdump_seq_file class keeps cat-ability and all methods of seq_file
and no extra methods from bpf_link that don't belong in seq_file.
But I'm arguing that bpfdump_seq_file class should have a member bpf_link
instead of simply holding bpf_prog via refcnt.
Let's call this child class of bpf_link the bpf_seq_file_link class. Having
bpf_seq_file_link as member would mean that such link is discoverable via IDR,
the user process can get an FD to it and can do get_obj_info_by_fd().
The information returned for such link will be a pair (bpfdump_prog, bpfdump_seq_file).
Meaning that at any given time 'bpftool link show' will show where every bpf
prog in the system is attached to.
Say named bpfdump_seq_file exists in /sys/kernel/bpfdump/tasks/foo.
No one is doing a 'cat' on it yet.
"bpftool link show" will show one link which is a pair (bpfdump_prog, "tasks/foo").
Now two humans are doing 'cat' of that file.
The bpfdump_prog refcnt is now 3 and there are two additional seq_files created
by the kernel when user said open("/sys/kernel/bpfdump/tasks/foo").
If these two humans are slow somebody could have done "rm /sys/kernel/bpfdump/tasks/foo"
and that bpfdump_seq_file and it's member bpf_seq_file_link would be gone,
but two other bpdump_seq_file-s are still active and they are different.
"bpftool link show" should be showing two pairs (bpfdump_prog, seq_file_A) and
(bpfdump_prog, seq_file_B).
The users could have been in different pid namespaces. What seq_file_A is
iterating could be completely different from seq_file_B, but I think it's
useful for admin to know where all bpf progs in the system are attached and
what kind of things are triggering them.
Powered by blists - more mailing lists