netdev - Re: [PATCH net 5/5] nfp: remove false positive offloads in flower vxlan

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 18 Apr 2018 23:31:01 +0100
From:   John Hurley <john.hurley@...ronome.com>
To:     Or Gerlitz <gerlitz.or@...il.com>
Cc:     Jakub Kicinski <jakub.kicinski@...ronome.com>,
        Linux Netdev List <netdev@...r.kernel.org>,
        oss-drivers@...ronome.com,
        Simon Horman <simon.horman@...ronome.com>
Subject: Re: [PATCH net 5/5] nfp: remove false positive offloads in flower vxlan

On Wed, Apr 18, 2018 at 7:18 PM, Or Gerlitz <gerlitz.or@...il.com> wrote:
> On Wed, Apr 18, 2018 at 3:31 PM, John Hurley <john.hurley@...ronome.com> wrote:
>> On Wed, Apr 18, 2018 at 8:43 AM, Or Gerlitz <gerlitz.or@...il.com> wrote:
>>> On Fri, Nov 17, 2017 at 4:06 AM, Jakub Kicinski
>>> <jakub.kicinski@...ronome.com> wrote:
>>>> From: John Hurley <john.hurley@...ronome.com>
>>>>
>>>> Pass information to the match offload on whether or not the repr is the
>>>> ingress or egress dev. Only accept tunnel matches if repr is the egress dev.
>>>>
>>>> This means rules such as the following are successfully offloaded:
>>>> tc .. add dev vxlan0 .. enc_dst_port 4789 .. action redirect dev nfp_p0
>>>>
>>>> While rules such as the following are rejected:
>>>> tc .. add dev nfp_p0 .. enc_dst_port 4789 .. action redirect dev vxlan0
>>>
>>> cool
>>>
>>>
>>>> Also reject non tunnel flows that are offloaded to an egress dev.
>>>> Non tunnel matches assume that the offload dev is the ingress port and
>>>> offload a match accordingly.
>>>
>>> not following on the "Also" here, see below
>>>
>>>
>>>> diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>> index a0193e0c24a0..f5d73b83dcc2 100644
>>>> --- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>> +++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
>>>> @@ -131,7 +131,8 @@ static bool nfp_flower_check_higher_than_mac(struct tc_cls_flower_offload *f)
>>>>
>>>>  static int
>>>>  nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
>>>> -                               struct tc_cls_flower_offload *flow)
>>>> +                               struct tc_cls_flower_offload *flow,
>>>> +                               bool egress)
>>>>  {
>>>>         struct flow_dissector_key_basic *mask_basic = NULL;
>>>>         struct flow_dissector_key_basic *key_basic = NULL;
>>>> @@ -167,6 +168,9 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
>>>>                         skb_flow_dissector_target(flow->dissector,
>>>>                                                   FLOW_DISSECTOR_KEY_ENC_CONTROL,
>>>>                                                   flow->key);
>>>> +               if (!egress)
>>>> +                       return -EOPNOTSUPP;
>>>> +
>>>>                 if (mask_enc_ctl->addr_type != 0xffff ||
>>>>                     enc_ctl->addr_type != FLOW_DISSECTOR_KEY_IPV4_ADDRS)
>>>>                         return -EOPNOTSUPP;
>>>> @@ -194,6 +198,9 @@ nfp_flower_calculate_key_layers(struct nfp_fl_key_ls *ret_key_ls,
>>>>
>>>>                 key_layer |= NFP_FLOWER_LAYER_VXLAN;
>>>>                 key_size += sizeof(struct nfp_flower_vxlan);
>>>> +       } else if (egress) {
>>>> +               /* Reject non tunnel matches offloaded to egress repr. */
>>>> +               return -EOPNOTSUPP;
>>>>         }
>>>
>>> with these two hunks we get: egress <- IFF -> encap match, right?
>>>
>>> (1) we can't offload the egress way if there isn't matching on encap headers
>>> (2) we can't go the matching on encap headers way if we are not egress
>>>
>>
>> yes, this is correct.
>> With the block code and egdev offload, we do not have access to the
>> ingress netdev when doing an offload.
>> We need to use the encap headers (especially the enc_port) to
>> distinguish the type of tunnel used and, therefore, require that the
>> encap matches be present before offloading.
>>
>>> what other cases are rejected by this logic?
>>>
>>
>> Yes, some other cases may be rejected (like veth mentioned below).
>
> my claim is that the veth case I mentioned below will not be rejected
> if it has the matching on encap headers, and a wrong rule will be set
> into hw, agree?
>

yes, unfortunately this is correct.
Without having access to the ingress netdev we have to put as many
restrictions as possible to ensure it is 'almost certainly' a given
ingress netdev but extreme cases can bypass this.

>> However, this is better than allowing rules to be incorrectly
>> offloaded (as could have happened before these changes).
>
>> Currently, we are looking at offloading flows on other ingress devices
>> such as bonds so this will require a change to the driver code here.
>
> for the ingress side, Jiri suggested that the slave devices (uplink reps),
> will be just getting all the rules set on the bond, so I am not sure what
> problem you see here... for decap it will be still vxlan --> vf rep and your
> egress logic will allow it.
>

Yes, Jiri suggested on another thread that the bonds simply relay
rules to their slaves.
This will work fine if uplink reprs are enslaved by a bond before
rules are added to it.
It would also assume that uplink reprs are not removed from/added to
the bond at later stages.
Doing this would require flushing the bond rules or writing all
existing rules to one of the slaves but not others.
Do you have any opinions on handling such situations?

>> IMO, the cleanest solution will also require tc core changes to either
>> avoid egdev offload or to have access to the ingress netdev of a rule.
>
>>> e.g If we add a rule with SW device (veth. tap) being the ingress, and
>>> HW device (vf rep)
>>> being the egress while not using skip_sw (just no flags == both) we
>>> get the TC stack
>>> go along the egdev callback from the vf rep hw device and add an
>>> (uplink --> vf rep) rule
>>> which will not be rejected if there is matching on tunnel headers, it
>>> will also not be rejected
>>> by some driver logic as the one we discussed to identify and ignore
>>> rules that are attempted to being added twice.
>>>
>>> Or.