[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9CC0547D-6582-4F6E-9670-979B08128897@fb.com>
Date: Wed, 17 Oct 2018 19:07:25 +0000
From: Song Liu <songliubraving@...com>
To: Alexei Starovoitov <ast@...com>
CC: Alexei Starovoitov <alexei.starovoitov@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"ast@...nel.org" <ast@...nel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>,
Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next 1/2] bpf: add cg_skb_is_valid_access for
BPF_PROG_TYPE_CGROUP_SKB
> On Oct 17, 2018, at 12:02 PM, Alexei Starovoitov <ast@...com> wrote:
>
> On 10/17/18 10:26 AM, Alexei Starovoitov wrote:
>> On Tue, Oct 16, 2018 at 10:56:05PM -0700, Song Liu wrote:
>>> BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
>>> skb. This patch enables direct access of skb for these programs.
>>
>> The lack of direct packet access in CGROUP_SKB progs was
>> an unpleasant surprise to me, so thank you for fixing it,
>> but there are few issues with the patch. See below.
>>
>>> In __cgroup_bpf_run_filter_skb(), bpf_compute_data_pointers() is called
>>> to compute proper data_end for the BPF program.
>>>
>>> Signed-off-by: Song Liu <songliubraving@...com>
>>> ---
>>> kernel/bpf/cgroup.c | 4 ++++
>>> net/core/filter.c | 26 +++++++++++++++++++++++++-
>>> 2 files changed, 29 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>>> index 00f6ed2e4f9a..340d496f35bd 100644
>>> --- a/kernel/bpf/cgroup.c
>>> +++ b/kernel/bpf/cgroup.c
>>> @@ -566,6 +566,10 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
>>> save_sk = skb->sk;
>>> skb->sk = sk;
>>> __skb_push(skb, offset);
>>> +
>>> + /* compute pointers for the bpf prog */
>>> + bpf_compute_data_pointers(skb);
>>> +
>>> ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
>>> bpf_prog_run_save_cb);
>>> __skb_pull(skb, offset);
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 1a3ac6c46873..8b5a502e241f 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -5346,6 +5346,30 @@ static bool sk_filter_is_valid_access(int off, int size,
>>> return bpf_skb_is_valid_access(off, size, type, prog, info);
>>> }
>>>
>>> +static bool cg_skb_is_valid_access(int off, int size,
>>> + enum bpf_access_type type,
>>> + const struct bpf_prog *prog,
>>> + struct bpf_insn_access_aux *info)
>>> +{
>>> + if (type == BPF_WRITE)
>>> + return false;
>>
>> this disables writes into cb[0..4] that were allowed for cgroup_inet_* before.
>> One can argue that this may break existing progs,
>> but looking at the place where BPF_CGROUP_RUN_PROG_INET_INGRESS is called
>> it seems it's actually not correct in all cases to access cb there.
>> Just few lines down we call bpf_prog_run_save_cb() which save/restores
>> these 24 bytes.
>> So we have two option either add save/restore for INET_INGRESS only
>> or disable read and write access to cb[0..4] for CGROUP_SKB progs.
>> I prefer the former.
>>
>>> +
>>> + switch (off) {
>>> + case bpf_ctx_range(struct __sk_buff, len):
>>> + break;
>>> + case bpf_ctx_range(struct __sk_buff, data):
>>> + info->reg_type = PTR_TO_PACKET;
>>> + break;
>>> + case bpf_ctx_range(struct __sk_buff, data_end):
>>> + info->reg_type = PTR_TO_PACKET_END;
>>> + break;
>>> + default:
>>> + return false;
>>> + }
>>
>> this also enables access to a range of fields family..local_port.
>> It's ok to do for egress, but not for ingress unless we
>> add code similar to the bottom of sk_filter_trim_cap() that
>> inits skb->sk.
>>
>> above change also allows access to data_meta and flow_keys
>> which is not correct.
>>
>> Considering all that I'm proposing to fix INET_INGRESS call site
>> similar to code below it in sk_filter_trim_cap().
>> In particular to do:
>> struct sock *save_sk = skb->sk;
>> skb->sk = sk;
>> save and clear cb
>> BPF_CGROUP_RUN_PROG_INET_INGRESS
>> restore cb
>> skb->sk = save_sk;
>>
>> all of above can probaby be inside BPF_CGROUP_RUN_PROG_INET_INGRESS macro.
>> Then in this cg_skb_is_valid_access() allow access to data/data_end
>> and family..local_port range as well.
>> while disallowing access to flow_keys and data_meta.
>>
>> In patch 2 we gotta have tests for all these fields.
>>
>> Thoughts?
>
> chatted with Song offline.
> I completely misread 'return false' in the above as 'break'.
> The patch actually disables access to pkt_type, mark, queue_mapping
> and so on. Which is not correct either.
> Since tests were not failing we really need to improve this aspect
> of test coverage in test_verifier.c
>
> Also I missed that __cgroup_bpf_run_filter_skb() already
> does save_sk = skb->sk; skb->sk = sk;
> and bpf_prog_run_save_cb()
> So no issue in the existing code. That was false alarm.
> Revising the proposal...
> I think cg_skb_is_valid_access() can be made similar to
> lwt_is_valid_access().
> Allowing writes into mark, priority, cb[0..4]
> and read of data/data_end.
> In addition it's also ok to allow family..local_port range
> (unlike lwt where sk may not be present).
> and no access to data_meta and flow_keys.
Thanks Alexei! I will send v2 shortly.
Song
Powered by blists - more mailing lists