[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1ebf33e8-09ea-a291-da59-08106525a00c@6wind.com>
Date: Wed, 21 Nov 2018 15:43:38 +0100
From: Nicolas Dichtel <nicolas.dichtel@...nd.com>
To: Daniel Borkmann <daniel@...earbox.net>
Cc: kafai@...com, ast@...nel.org, davem@...emloft.net,
netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v2] filter: add BPF_ADJ_ROOM_DATA mode to
bpf_skb_adjust_room()
Le 20/11/2018 à 02:06, Daniel Borkmann a écrit :
> On 11/13/2018 05:35 PM, Nicolas Dichtel wrote:
>> This new mode enables to add or remove an l2 header in a programmatic way
>> with cls_bpf.
>> For example, it enables to play with mpls headers.
>>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@...nd.com>
>> Acked-by: Martin KaFai Lau <kafai@...com>
>
> (Sorry for late reply, swamped due to Plumbers.)
I thought so, no problem ;-)
>
>> ---
>>
>> v2: use skb_set_network_header()
>>
>> include/uapi/linux/bpf.h | 3 ++
>> net/core/filter.c | 53 ++++++++++++++++++++++++++++++++++
>> tools/include/uapi/linux/bpf.h | 3 ++
>> 3 files changed, 59 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 47d606d744cc..9762ef3a536c 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1467,6 +1467,8 @@ union bpf_attr {
>> *
>> * * **BPF_ADJ_ROOM_NET**: Adjust room at the network layer
>> * (room space is added or removed below the layer 3 header).
>> + * * **BPF_ADJ_ROOM_DATA**: Adjust room at the beginning of the
>> + * packet (room space is added or removed below skb->data).
>
> Nit: in this case it would probably make more sense to name it BPF_ADJ_ROOM_MAC.
> BPF_ADJ_ROOM_DATA reads like we're adjusting payload data.
Yep, I first had in mind skb->data, but you're right.
>
>> * All values for *flags* are reserved for future usage, and must
>> * be left at zero.
>> @@ -2412,6 +2414,7 @@ enum bpf_func_id {
>> /* Mode for BPF_FUNC_skb_adjust_room helper. */
>> enum bpf_adj_room_mode {
>> BPF_ADJ_ROOM_NET,
>> + BPF_ADJ_ROOM_DATA,
>> };
>>
>> /* Mode for BPF_FUNC_skb_load_bytes_relative helper. */
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 53d50fb75ea1..e56da3077dca 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -2884,6 +2884,57 @@ static int bpf_skb_adjust_net(struct sk_buff *skb, s32 len_diff)
>> return ret;
>> }
>>
>> +static int bpf_skb_data_shrink(struct sk_buff *skb, u32 len)
>> +{
>> + unsigned short hhlen = skb->dev->header_ops ?
>> + skb->dev->hard_header_len : 0;
>> + int ret;
>> +
>> + ret = skb_unclone(skb, GFP_ATOMIC);
>> + if (unlikely(ret < 0))
>> + return ret;
>> +
>> + __skb_pull(skb, len);
>> + skb_reset_mac_header(skb);
>> + skb_set_network_header(skb, hhlen);
>> + skb_reset_transport_header(skb);
>> + return 0;
>> +}
>> +
>> +static int bpf_skb_data_grow(struct sk_buff *skb, u32 len)
>> +{
>> + unsigned short hhlen = skb->dev->header_ops ?
>> + skb->dev->hard_header_len : 0;
>> + int ret;
>> +
>> + ret = skb_cow(skb, len);
>> + if (unlikely(ret < 0))
>> + return ret;
>> +
>> + skb_push(skb, len);
>> + skb_reset_mac_header(skb);
>
> Looks like this could leak uninitialized mem into the BPF prog as it's
> not zeroed out. Can't we just reuse bpf_skb_generic_push() and the
> bpf_skb_generic_pop()?
Hmm, at some point, it was not possible, but it was an intermediate version and
I don't see why now.
>
> bpf_skb_vlan_push() and bpf_skb_vlan_pop() do a bpf_push_mac_rcsum() /
> bpf_pull_mac_rcsum() in order to not screw up things like csum complete.
> Wouldn't this be needed here as well?
>
> How does this interact with MPLS GSO?
I overlooked both points.
>
> If the packet gets pushed back into the stack, would AF_MPLS handle it?
> Probably good to add test cases to BPF kselftest suite with it.
Ok, I will have a look to this.
>
> Don't we need to reject the packet in case of skb->encapsulation?
>
> Looking at push_mpls() and pop_mpls() from OVS implementation, do we
> also need to deal with skb inner network header / proto?
>
> Given all this would it rather make sense to add MPLS specific helpers
> in similar fashion to the vlan ones we have?
>
> Is there any other use case aside from MPLS?
Yes, I was targeting a generic encap/decap at l2 level. Maybe more complex than
I thought.
>
>> + return 0;
>> +}
>> +
>> +static int bpf_skb_adjust_data(struct sk_buff *skb, s32 len_diff)
>> +{
>> + u32 len_diff_abs = abs(len_diff);
>> + bool shrink = len_diff < 0;
>> + int ret;
>> +
>> + if (unlikely(len_diff_abs > 0xfffU))
>> + return -EFAULT;
>> +
>> + if (shrink && len_diff_abs >= skb_headlen(skb))
>> + return -EFAULT;
>> +
>> + ret = shrink ? bpf_skb_data_shrink(skb, len_diff_abs) :
>> + bpf_skb_data_grow(skb, len_diff_abs);
>
> Hmm, I think this has a bug in that the above two do not adjust
> skb->mac_len. Then things like cls_bpf_classify() will __skb_pull()
> based on the old mac_len, getting you to a wrong offset in the
> packet when this is for example pushed back into ingress, etc.
Nice catch!
Thank you,
Nicolas
Powered by blists - more mailing lists