netdev - Re: [PATCH bpf-next 1/2] bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <9CC0547D-6582-4F6E-9670-979B08128897@fb.com>
Date:   Wed, 17 Oct 2018 19:07:25 +0000
From:   Song Liu <songliubraving@...com>
To:     Alexei Starovoitov <ast@...com>
CC:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "ast@...nel.org" <ast@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next 1/2] bpf: add cg_skb_is_valid_access for
 BPF_PROG_TYPE_CGROUP_SKB



> On Oct 17, 2018, at 12:02 PM, Alexei Starovoitov <ast@...com> wrote:
> 
> On 10/17/18 10:26 AM, Alexei Starovoitov wrote:
>> On Tue, Oct 16, 2018 at 10:56:05PM -0700, Song Liu wrote:
>>> BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
>>> skb. This patch enables direct access of skb for these programs.
>> 
>> The lack of direct packet access in CGROUP_SKB progs was
>> an unpleasant surprise to me, so thank you for fixing it,
>> but there are few issues with the patch. See below.
>> 
>>> In __cgroup_bpf_run_filter_skb(), bpf_compute_data_pointers() is called
>>> to compute proper data_end for the BPF program.
>>> 
>>> Signed-off-by: Song Liu <songliubraving@...com>
>>> ---
>>> kernel/bpf/cgroup.c |  4 ++++
>>> net/core/filter.c   | 26 +++++++++++++++++++++++++-
>>> 2 files changed, 29 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>>> index 00f6ed2e4f9a..340d496f35bd 100644
>>> --- a/kernel/bpf/cgroup.c
>>> +++ b/kernel/bpf/cgroup.c
>>> @@ -566,6 +566,10 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
>>> 	save_sk = skb->sk;
>>> 	skb->sk = sk;
>>> 	__skb_push(skb, offset);
>>> +
>>> +	/* compute pointers for the bpf prog */
>>> +	bpf_compute_data_pointers(skb);
>>> +
>>> 	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
>>> 				 bpf_prog_run_save_cb);
>>> 	__skb_pull(skb, offset);
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 1a3ac6c46873..8b5a502e241f 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -5346,6 +5346,30 @@ static bool sk_filter_is_valid_access(int off, int size,
>>> 	return bpf_skb_is_valid_access(off, size, type, prog, info);
>>> }
>>> 
>>> +static bool cg_skb_is_valid_access(int off, int size,
>>> +				   enum bpf_access_type type,
>>> +				   const struct bpf_prog *prog,
>>> +				   struct bpf_insn_access_aux *info)
>>> +{
>>> +	if (type == BPF_WRITE)
>>> +		return false;
>> 
>> this disables writes into cb[0..4] that were allowed for cgroup_inet_* before.
>> One can argue that this may break existing progs,
>> but looking at the place where BPF_CGROUP_RUN_PROG_INET_INGRESS is called
>> it seems it's actually not correct in all cases to access cb there.
>> Just few lines down we call bpf_prog_run_save_cb() which save/restores
>> these 24 bytes.
>> So we have two option either add save/restore for INET_INGRESS only
>> or disable read and write access to cb[0..4] for CGROUP_SKB progs.
>> I prefer the former.
>> 
>>> +
>>> +	switch (off) {
>>> +	case bpf_ctx_range(struct __sk_buff, len):
>>> +		break;
>>> +	case bpf_ctx_range(struct __sk_buff, data):
>>> +		info->reg_type = PTR_TO_PACKET;
>>> +		break;
>>> +	case bpf_ctx_range(struct __sk_buff, data_end):
>>> +		info->reg_type = PTR_TO_PACKET_END;
>>> +		break;
>>> +	default:
>>> +		return false;
>>> +	}
>> 
>> this also enables access to a range of fields family..local_port.
>> It's ok to do for egress, but not for ingress unless we
>> add code similar to the bottom of sk_filter_trim_cap() that
>> inits skb->sk.
>> 
>> above change also allows access to data_meta and flow_keys
>> which is not correct.
>> 
>> Considering all that I'm proposing to fix INET_INGRESS call site
>> similar to code below it in sk_filter_trim_cap().
>> In particular to do:
>> struct sock *save_sk = skb->sk;
>> skb->sk = sk;
>> save and clear cb
>> BPF_CGROUP_RUN_PROG_INET_INGRESS
>> restore cb
>> skb->sk = save_sk;
>> 
>> all of above can probaby be inside BPF_CGROUP_RUN_PROG_INET_INGRESS macro.
>> Then in this cg_skb_is_valid_access() allow access to data/data_end
>> and family..local_port range as well.
>> while disallowing access to flow_keys and data_meta.
>> 
>> In patch 2 we gotta have tests for all these fields.
>> 
>> Thoughts?
> 
> chatted with Song offline.
> I completely misread 'return false' in the above as 'break'.
> The patch actually disables access to pkt_type, mark, queue_mapping
> and so on. Which is not correct either.
> Since tests were not failing we really need to improve this aspect
> of test coverage in test_verifier.c
> 
> Also I missed that __cgroup_bpf_run_filter_skb() already
> does save_sk = skb->sk; skb->sk = sk;
> and bpf_prog_run_save_cb()
> So no issue in the existing code. That was false alarm.
> Revising the proposal...
> I think cg_skb_is_valid_access() can be made similar to
> lwt_is_valid_access().
> Allowing writes into mark, priority, cb[0..4]
> and read of data/data_end.
> In addition it's also ok to allow family..local_port range
> (unlike lwt where sk may not be present).
> and no access to data_meta and flow_keys.

Thanks Alexei! I will send v2 shortly. 

Song