netdev - Re: [PATCH net-next 2/2] flow_dissector: Add limits for encapsulation and EH

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87tw0mjs11.fsf@stressinduktion.org>
Date:   Fri, 01 Sep 2017 18:35:38 +0200
From:   Hannes Frederic Sowa <hannes@...essinduktion.org>
To:     Tom Herbert <tom@...ntonium.net>
Cc:     "David S . Miller" <davem@...emloft.net>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        alex.popov@...ux.com
Subject: Re: [PATCH net-next 2/2] flow_dissector: Add limits for encapsulation and EH

Hello Tom,

Tom Herbert <tom@...ntonium.net> writes:

> On Fri, Sep 1, 2017 at 6:32 AM, Hannes Frederic Sowa
> <hannes@...essinduktion.org> wrote:
>> Tom Herbert <tom@...ntonium.net> writes:
>>
>>> In flow dissector there are no limits to the number of nested
>>> encapsulations that might be dissected which makes for a nice DOS
>>> attack. This patch limits for dissecting nested encapsulations
>>> as well as for dissecting over extension headers.
>>
>> I was actually more referring to your patch, because the flow dissector
>> right now is not stack recursive. Your changes would make it doing
>> recursion on the stack.
>
> I don't believe those patches had any recursion.

I was wrong with stack recursion, you handle that using the
FLOW_DISSECT_RET_PROTO_AGAIN return value thus leaving the stack frame
again, sorry.

But otherwise the walk would be unlimited (based on the packet size) in
your first patchset, correct? See this malicious example:

| IP1 | UDP1 | VXLAN1 | Ethernet | IP2 | UDP2 | VXLAN2 | ...

where IP1 == IP2, UDP1 == UDP2 and VXLAN1 != VXLAN2?

Notice that because IP1 == IP2 and UDP1 == UDP2 it seems to me it would
hit the same socket again. We would be prone to overwrite vxlan id 1
with vxlan id 2 in the key thus the key would be malicious and traffic
could be injected into other tenant networks, if the encapsulated
packets within VXLAN1 could be generated by a malicious user?

I was actually not concerned about the "recursion" but merely about
updating the values to the innermost values.

>>> Reported-by: Hannes Frederic Sowa <hannes@...essinduktion.org>
>>> Signed-off-by: Tom Herbert <tom@...ntonium.net>
>>> ---
>>>  net/core/flow_dissector.c | 48 ++++++++++++++++++++++++++++++++++++++++++++---
>>>  1 file changed, 45 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>>> index 5110180a3e96..1bca748de27d 100644
>>> --- a/net/core/flow_dissector.c
>>> +++ b/net/core/flow_dissector.c
>>> @@ -396,6 +396,35 @@ __skb_flow_dissect_ipv6(const struct sk_buff *skb,
>>>       key_ip->ttl = iph->hop_limit;
>>>  }
>>>
>>> +/* Maximum number of nested encapsulations that can be processed in
>>> + * __skb_flow_dissect
>>> + */
>>> +#define MAX_FLOW_DISSECT_ENCAPS      5
>>
>> I think you can exactly parse one encapsulation layer safely. This is
>> because you can only keep state on one encapsulation layer protocol
>> right now.
>>
>> For example this scenario:
>>
>> I would like to circumvent tc flower rules that filter base
>> simply construct a packet looking like:
>>
>> Ethernet|Vlan|IP|GRE|Ethernet|Vlan|
>>
>> because we don't recurse in the flow keys either, the second vlan header
>> would overwrite the information of the first one.
>>
> Flow dissector returns the inner most information it sees. If only
> information from the outermost headers is needed then the caller
> should set FLOW_DISSECTOR_F_STOP_AT_ENCAP in the flags.

Right now, flower does not do so and seems to be prone to be fooled like
that.

Additionally, flow_dissector does not update all keys to its inner most
values also, e.g. I don't see the inner packets updating
FLOW_DISSECTOR_KEY_ETH_ADDRS (e.g. with ETH_P_TEB).

To me it would make more sense to be able to specificy
FLOW_DISSECTOR_F_AFTER_FIRST_ENCAP, so that we get the first
encapsulation information with the key, do policy decision, pull headers
and recurse here.

>>> +
>>> +static bool skb_flow_dissect_encap_allowed(int *num_encaps, unsigned int *flags)
>>> +{
>>> +     ++*num_encaps;
>>> +
>>> +     if (*num_encaps >= MAX_FLOW_DISSECT_ENCAPS) {
>>> +             if (*num_encaps == MAX_FLOW_DISSECT_ENCAPS) {
>>> +                     /* Allow one more pass but ignore disregard
>>> +                      * further encapsulations
>>> +                      */
>>> +                     *flags |= FLOW_DISSECTOR_F_STOP_AT_ENCAP;
>>> +             } else {
>>> +                     /* Max encaps reached */
>>> +                     return  false;
>>> +             }
>>> +     }
>>> +
>>> +     return true;
>>> +}
>>> +
>>> +/* Maximum number of extension headers can be processed in __skb_flow_dissect
>>> + * per IPv6 packet
>>> + */
>>> +#define MAX_FLOW_DISSECT_EH  5
>>
>> I would at least allow each extension header once, DEST_OPS twice, thus
>> let's say 12? It is safe in this version because it does not consume
>> stack space anyway and doesn't update internal state.
>
> This per each IPv6 header. This patch would allow up to five
> encapsulations of IPv6/IPv6 so maximum number of EHs we would consider
> is 6*5=30.

I would still increase the limit, but also no hard feelings. The reason
I came up with 12 is that anyway each header (besides DST_OPS) is only
allowed to show up once per header (11 + 1).

I am fine that this is a per-IPv6 header limit.

>>
>>> +
>>>  /**
>>>   * __skb_flow_dissect - extract the flow_keys struct and return it
>>>   * @skb: sk_buff to extract the flow from, can be NULL if the rest are specified
>>> @@ -426,6 +455,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>>>       struct flow_dissector_key_tags *key_tags;
>>>       struct flow_dissector_key_vlan *key_vlan;
>>>       enum flow_dissect_ret fdret;
>>> +     int num_eh, num_encaps = 0;
>>>       bool skip_vlan = false;
>>>       u8 ip_proto = 0;
>>>       bool ret;
>>> @@ -714,7 +744,9 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
>>>       case FLOW_DISSECT_RET_OUT_GOOD:
>>>               goto out_good;
>>>       case FLOW_DISSECT_RET_PROTO_AGAIN:
>>> -             goto proto_again;
>>> +             if (skb_flow_dissect_encap_allowed(&num_encaps, &flags))
>>> +                     goto proto_again;
>>
>> I think you should get the check to the proto_again label. In case you
>> loop to often you can `goto out_good'.
>>
> That would add an extra conditional in the common case of no
> encapsulation. Having it here means we only care about the
> encapsulation limit when there is encapsulation.

In my opinion it would make the limits easier to grasp, but no hard
feelings about that.

Thanks,
Hannes