[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DBED96B8-64E3-480B-93B2-1175682FF3B4@ovn.org>
Date: Tue, 7 Feb 2017 21:31:16 -0800
From: Jarno Rajahalme <jarno@....org>
To: Joe Stringer <joe@....org>
Cc: netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next 5/7] openvswitch: Add original direction conntrack tuple to sw_flow_key.
> On Feb 6, 2017, at 11:15 PM, Joe Stringer <joe@....org> wrote:
>
> On 2 February 2017 at 17:10, Jarno Rajahalme <jarno@....org> wrote:
>> Add the fields of the conntrack original direction 5-tuple to struct
>> sw_flow_key. The new fields are initially zeroed, and are populated
>> whenever a conntrack action is executed and either finds or generates
>> a conntrack entry. This means that these fields exist for all packets
>> were not rejected by conntrack as untrackable.
>>
>> The original tuple fields in the sw_flow_key are filled from the
>> original direction tuple of the conntrack entry relating to the
>> current packet, or from the original direction tuple of the master
>> conntrack entry, if the current conntrack entry has a master.
>> Generally, expected connections of connections having an assigned
>> helper (e.g., FTP), have a master conntrack entry.
>>
>> The main purpose of the new conntrack original tuple fields is to
>> allow matching on them for policy decision purposes, with the premise
>> that the admissibility of tracked connections reply packets (as well
>> as original direction packets), and both direction packets of any
>> related connections may be based on ACL rules applying to the master
>> connection's original direction 5-tuple. This also makes it easier to
>> make policy decisions when the actual packet headers might have been
>> transformed by NAT, as the original direction 5-tuple represents the
>> packet headers before any such transformation.
>>
>> When using the original direction 5-tuple the admissibility of return
>> and/or related packets need not be based on the mere existence of a
>> conntrack entry, allowing separation of admission policy from the
>> established conntrack state. While existence of a conntrack entry is
>> required for admission of the return or related packets, policy
>> changes can render connections that were initially admitted to be
>> rejected or dropped afterwards. If the admission of the return and
>> related packets was based on mere conntrack state (e.g., connection
>> being in an established state), a policy change that would make the
>> connection rejected or dropped would need to find and delete all
>> conntrack entries affected by such a change. When using the original
>> direction 5-tuple matching the affected conntrack entries can be
>> allowed to time out instead, as the established state of the
>> connection would not need to be the basis for packet admission any
>> more.
>>
>> It should be noted that the directionality of related connections may
>> be the same or different than that of the master connection, and
>> neither the original direction 5-tuple nor the conntrack state bits
>> carry this information. If needed, the directionality of the master
>> connection can be stored in master's conntrack mark or labels, which
>> are automatically inherited by the expected related connections.
>>
>> The fact that neither ARP not ND packets are trackable by conntrack
>> allows mutual exclusion between ARP/ND and the new conntrack original
>> tuple fields. Hence, the IP addresses are overlaid in union with ARP
>> and ND fields. This allows the sw_flow_key to not grow much due to
>> this patch, but it also means that we must be careful to never use the
>> new key fields with ARP or ND packets. ARP is easy to distinguish and
>> keep mutually exclusive based on the ethernet type, but ND being an
>> ICMPv6 protocol requires a bit more attention.
>>
>> Signed-off-by: Jarno Rajahalme <jarno@....org>
>> ---
>
> OK, maybe we need to do something a bit more to handle the NATed
> related connections to address the problem in patch 1.
>
> <snip>
>
>> diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
>> index 738a4fa..1afe153 100644
>> --- a/net/openvswitch/conntrack.c
>> +++ b/net/openvswitch/conntrack.c
>> @@ -155,6 +155,59 @@ static void __ovs_ct_update_key(struct sw_flow_key *key, u8 state,
>> key->ct.zone = zone->id;
>> key->ct.mark = ovs_ct_get_mark(ct);
>> ovs_ct_get_labels(ct, &key->ct.labels);
>> +
>> + /* Use the master if we have one. */
>> + if (ct && ct->master)
>> + ct = ct->master;
>
> Perhaps:
>
> if (!ct || sw_flow_key_is_nd(key) || !is_ip_any(key->eth.type)) {
> /* zero everything */
> return;
> }
>
> One of the things this helps us to avoid is having a comment in the
> middle of an if statement.
>
> Then afterwards,
> if (ct->master)
> ct = ct->master;
>
>> +
>> + key->ct.orig_proto = 0;
>> + key->ct.orig_tp.src = 0;
>> + key->ct.orig_tp.dst = 0;
>> + if (key->eth.type == htons(ETH_P_IP)) {
>> + /* IP version must match. */
>> + if (ct && nf_ct_l3num(ct) == NFPROTO_IPV4) {
>
> I don't quite understand how we could end up with a connection NFPROTO
> that is mismatched with an IP version that we should handle here, but
> if there are some legitimite cases perhaps we can pick them up and
> handle them in the early exit condition above?
>
One would be if an IPv4 FTP control connection opened an IPv6 data connection. Our flow key layout assumes the IP versions between master and related connection match. I added a comment to that effect.
> We can probably share a few more lines between IPv4 and IPv6 here.
>
I refactored all the repeated code away for v2.
>> @@ -208,24 +261,54 @@ void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key)
>> ovs_ct_update_key(skb, NULL, key, false, false);
>> }
>>
>> -int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb)
>> +#define IN6_ADDR_INITIALIZER(ADDR) \
>> + { (ADDR).s6_addr32[0], (ADDR).s6_addr32[1], \
>> + (ADDR).s6_addr32[2], (ADDR).s6_addr32[3] }
>
> scripts/checkpatch.pl asks:
>
> CHECK: Macro argument reuse 'ADDR' - possible side-effects?
> #704: FILE: net/openvswitch/conntrack.c:264:
> +#define IN6_ADDR_INITIALIZER(ADDR) \
> + { (ADDR).s6_addr32[0], (ADDR).s6_addr32[1], \
> + (ADDR).s6_addr32[2], (ADDR).s6_addr32[3] }
>
> I'm guessing it just doesn't like your use of the name 'ADDR’.
Rather it is recommending caution about ‘ADDR’ being evaluated multiple times, which is no problem here.
>
>> + if (swkey->eth.type == htons(ETH_P_IP) && swkey->ct.orig_proto) {
>> + struct ovs_key_ct_tuple_ipv4 orig = {
>> + output->ipv4.ct_orig.src,
>> + output->ipv4.ct_orig.dst,
>> + output->ct.orig_tp.src,
>> + output->ct.orig_tp.dst,
>> + output->ct.orig_proto,
>> + };
>> + if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4, sizeof(orig),
>> + &orig))
>> + return -EMSGSIZE;
>> + } else if (swkey->eth.type == htons(ETH_P_IPV6) &&
>> + swkey->ct.orig_proto) {
>> + struct ovs_key_ct_tuple_ipv6 orig = {
>> + IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.src),
>> + IN6_ADDR_INITIALIZER(output->ipv6.ct_orig.dst),
>> + output->ct.orig_tp.src,
>> + output->ct.orig_tp.dst,
>> + output->ct.orig_proto,
>> + };
>> + if (nla_put(skb, OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6, sizeof(orig),
>> + &orig))
>> + return -EMSGSIZE;
>> + }
>
> swkey->ct.orig_proto check is common across both conditions, maybe
> check that first then nest the other logic within?
Done.
>
>> +
>> return 0;
>> }
>>
>> diff --git a/net/openvswitch/conntrack.h b/net/openvswitch/conntrack.h
>> index 8f6230b..8573ab3 100644
>> --- a/net/openvswitch/conntrack.h
>> +++ b/net/openvswitch/conntrack.h
>> @@ -32,7 +32,8 @@ int ovs_ct_execute(struct net *, struct sk_buff *, struct sw_flow_key *,
>> const struct ovs_conntrack_info *);
>>
>> void ovs_ct_fill_key(const struct sk_buff *skb, struct sw_flow_key *key);
>> -int ovs_ct_put_key(const struct sw_flow_key *key, struct sk_buff *skb);
>> +int ovs_ct_put_key(const struct sw_flow_key *swkey,
>> + const struct sw_flow_key *output, struct sk_buff *skb);
>> void ovs_ct_free_action(const struct nlattr *a);
>>
>> #define CT_SUPPORTED_MASK (OVS_CS_F_NEW | OVS_CS_F_ESTABLISHED | \
>> @@ -79,9 +80,17 @@ static inline void ovs_ct_fill_key(const struct sk_buff *skb,
>> key->ct.zone = 0;
>> key->ct.mark = 0;
>> memset(&key->ct.labels, 0, sizeof(key->ct.labels));
>> + key->ct.orig_proto = 0;
>> + key->ct.orig_tp.src = 0;
>> + key->ct.orig_tp.dst = 0;
>> + if (key->eth.type == htons(ETH_P_IP))
>> + memset(&key->ipv4.ct_orig, 0, sizeof(key->ipv4.ct_orig));
>> + else if (key->eth.type == htons(ETH_P_IPV6) && !sw_flow_key_is_nd(key))
>> + memset(&key->ipv6.ct_orig, 0, sizeof(key->ipv6.ct_orig));
>
> If we only ever use these fields based on key->ct.orig_proto being
> nonzero, then we m
>
All these fields need to be initialized for megaflow matching purposes. We use orig_proto non-zero check to avoid encoding all-zero netlink attribute.
> <snip>
>
>> +static inline bool sw_flow_key_is_nd(const struct sw_flow_key *key)
>> +{
>> + return key->eth.type == htons(ETH_P_IPV6) &&
>> + key->ip.proto == NEXTHDR_ICMP &&
>> + key->tp.dst == 0 &&
>> + (key->tp.src == ntohs(NDISC_NEIGHBOUR_SOLICITATION) ||
>> + key->tp.src == ntohs(NDISC_NEIGHBOUR_ADVERTISEMENT));
>
> Sparse complains:
>
> net/openvswitch/flow.h:163:33: warning: cast to restricted __be16
> net/openvswitch/flow.h:163:25: warning: restricted __be16 degrades to integer
> net/openvswitch/flow.h:164:33: warning: cast to restricted __be16
> net/openvswitch/flow.h:164:25: warning: restricted __be16 degrades to integer
>
Oops, should have used htons() instead.
>> @@ -161,8 +163,11 @@ static bool match_validate(const struct sw_flow_match *match,
>>
>> if (match->key->eth.type == htons(ETH_P_IP)) {
>> key_expected |= 1 << OVS_KEY_ATTR_IPV4;
>> - if (match->mask && (match->mask->key.eth.type == htons(0xffff)))
>> + if (match->mask &&
>> + (match->mask->key.eth.type == htons(0xffff))) {
>
> Minor nit, can probably drop the parentheses and keep this to one line?
>
OK.
>> @@ -1430,6 +1476,10 @@ int ovs_nla_get_match(struct net *net, struct sw_flow_match *match,
>> goto free_newmask;
>> }
>>
>> + /* This also checks that conntrack original direction metadata exists
>> + * only for packets for which it makes sense. Otherwise the key may be
>> + * corrupted due to overlapping key fields.
>> + */
>> if (!match_validate(match, key_attrs, mask_attrs, log))
>> err = -EINVAL;
>>
>
> I believe that this is already explained in the relevant portion of
> match_validate(), not sure it's worth highlighting that particular
> piece of validation above a call to such a general function.
Removed.
Powered by blists - more mailing lists