[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190618163117.yuw44b24lo6prsrz@ast-mbp.dhcp.thefacebook.com>
Date: Tue, 18 Jun 2019 09:31:19 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: netdev@...r.kernel.org, bpf@...r.kernel.org, davem@...emloft.net,
ast@...nel.org, daniel@...earbox.net, Martin Lau <kafai@...com>
Subject: Re: [PATCH bpf-next v6 1/9] bpf: implement getsockopt and setsockopt
hooks
On Mon, Jun 17, 2019 at 11:01:01AM -0700, Stanislav Fomichev wrote:
> Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
> BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.
>
> BPF_CGROUP_SETSOCKOPT get a read-only view of the setsockopt arguments.
> BPF_CGROUP_GETSOCKOPT can modify the supplied buffer.
> Both of them reuse existing PTR_TO_PACKET{,_END} infrastructure.
>
> The buffer memory is pre-allocated (because I don't think there is
> a precedent for working with __user memory from bpf). This might be
> slow to do for each {s,g}etsockopt call, that's why I've added
> __cgroup_bpf_prog_array_is_empty that exits early if there is nothing
> attached to a cgroup. Note, however, that there is a race between
> __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
> program layout might have changed; this should not be a problem
> because in general there is a race between multiple calls to
> {s,g}etsocktop and user adding/removing bpf progs from a cgroup.
>
> The return code of the BPF program is handled as follows:
> * 0: EPERM
> * 1: success, execute kernel {s,g}etsockopt path after BPF prog exits
> * 2: success, do _not_ execute kernel {s,g}etsockopt path after BPF
> prog exits
>
> Note that if 0 or 2 is returned from BPF program, no further BPF program
> in the cgroup hierarchy is executed. This is in contrast with any existing
> per-cgroup BPF attach_type.
This is drastically different from all other cgroup-bpf progs.
I think all programs should be executed regardless of return code.
It seems to me that 1 vs 2 difference can be expressed via bpf program logic
instead of return code.
How about we do what all other cgroup-bpf progs do:
"any no is no. all yes is yes"
Meaning any ret=0 - EPERM back to user.
If all are ret=1 - kernel handles get/set.
I think the desire to differentiate 1 vs 2 came from ordering issue
on getsockopt.
How about for setsockopt all progs run first and then kernel.
For getsockopt kernel runs first and then all progs.
Then progs will have an ability to overwrite anything the kernel returns.
Powered by blists - more mailing lists