netdev - Re: [PATCH bpf-next 1/2] bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7c6474c0-27a2-bd6c-cd0f-c835856224cb@fb.com>
Date:   Wed, 17 Oct 2018 19:02:28 +0000
From:   Alexei Starovoitov <ast@...com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Song Liu <songliubraving@...com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "ast@...nel.org" <ast@...nel.org>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        Kernel Team <Kernel-team@...com>
Subject: Re: [PATCH bpf-next 1/2] bpf: add cg_skb_is_valid_access for
 BPF_PROG_TYPE_CGROUP_SKB

On 10/17/18 10:26 AM, Alexei Starovoitov wrote:
> On Tue, Oct 16, 2018 at 10:56:05PM -0700, Song Liu wrote:
>> BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the
>> skb. This patch enables direct access of skb for these programs.
>
> The lack of direct packet access in CGROUP_SKB progs was
> an unpleasant surprise to me, so thank you for fixing it,
> but there are few issues with the patch. See below.
>
>> In __cgroup_bpf_run_filter_skb(), bpf_compute_data_pointers() is called
>> to compute proper data_end for the BPF program.
>>
>> Signed-off-by: Song Liu <songliubraving@...com>
>> ---
>>  kernel/bpf/cgroup.c |  4 ++++
>>  net/core/filter.c   | 26 +++++++++++++++++++++++++-
>>  2 files changed, 29 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>> index 00f6ed2e4f9a..340d496f35bd 100644
>> --- a/kernel/bpf/cgroup.c
>> +++ b/kernel/bpf/cgroup.c
>> @@ -566,6 +566,10 @@ int __cgroup_bpf_run_filter_skb(struct sock *sk,
>>  	save_sk = skb->sk;
>>  	skb->sk = sk;
>>  	__skb_push(skb, offset);
>> +
>> +	/* compute pointers for the bpf prog */
>> +	bpf_compute_data_pointers(skb);
>> +
>>  	ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[type], skb,
>>  				 bpf_prog_run_save_cb);
>>  	__skb_pull(skb, offset);
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 1a3ac6c46873..8b5a502e241f 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -5346,6 +5346,30 @@ static bool sk_filter_is_valid_access(int off, int size,
>>  	return bpf_skb_is_valid_access(off, size, type, prog, info);
>>  }
>>
>> +static bool cg_skb_is_valid_access(int off, int size,
>> +				   enum bpf_access_type type,
>> +				   const struct bpf_prog *prog,
>> +				   struct bpf_insn_access_aux *info)
>> +{
>> +	if (type == BPF_WRITE)
>> +		return false;
>
> this disables writes into cb[0..4] that were allowed for cgroup_inet_* before.
> One can argue that this may break existing progs,
> but looking at the place where BPF_CGROUP_RUN_PROG_INET_INGRESS is called
> it seems it's actually not correct in all cases to access cb there.
> Just few lines down we call bpf_prog_run_save_cb() which save/restores
> these 24 bytes.
> So we have two option either add save/restore for INET_INGRESS only
> or disable read and write access to cb[0..4] for CGROUP_SKB progs.
> I prefer the former.
>
>> +
>> +	switch (off) {
>> +	case bpf_ctx_range(struct __sk_buff, len):
>> +		break;
>> +	case bpf_ctx_range(struct __sk_buff, data):
>> +		info->reg_type = PTR_TO_PACKET;
>> +		break;
>> +	case bpf_ctx_range(struct __sk_buff, data_end):
>> +		info->reg_type = PTR_TO_PACKET_END;
>> +		break;
>> +	default:
>> +		return false;
>> +	}
>
> this also enables access to a range of fields family..local_port.
> It's ok to do for egress, but not for ingress unless we
> add code similar to the bottom of sk_filter_trim_cap() that
> inits skb->sk.
>
> above change also allows access to data_meta and flow_keys
> which is not correct.
>
> Considering all that I'm proposing to fix INET_INGRESS call site
> similar to code below it in sk_filter_trim_cap().
> In particular to do:
> struct sock *save_sk = skb->sk;
> skb->sk = sk;
> save and clear cb
> BPF_CGROUP_RUN_PROG_INET_INGRESS
> restore cb
> skb->sk = save_sk;
>
> all of above can probaby be inside BPF_CGROUP_RUN_PROG_INET_INGRESS macro.
> Then in this cg_skb_is_valid_access() allow access to data/data_end
> and family..local_port range as well.
> while disallowing access to flow_keys and data_meta.
>
> In patch 2 we gotta have tests for all these fields.
>
> Thoughts?

chatted with Song offline.
I completely misread 'return false' in the above as 'break'.
The patch actually disables access to pkt_type, mark, queue_mapping
and so on. Which is not correct either.
Since tests were not failing we really need to improve this aspect
of test coverage in test_verifier.c

Also I missed that __cgroup_bpf_run_filter_skb() already
does save_sk = skb->sk; skb->sk = sk;
and bpf_prog_run_save_cb()
So no issue in the existing code. That was false alarm.
Revising the proposal...
I think cg_skb_is_valid_access() can be made similar to
lwt_is_valid_access().
Allowing writes into mark, priority, cb[0..4]
and read of data/data_end.
In addition it's also ok to allow family..local_port range
(unlike lwt where sk may not be present).
and no access to data_meta and flow_keys.