[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161206122634.GH1984@nanopsycho>
Date: Tue, 6 Dec 2016 13:26:34 +0100
From: Jiri Pirko <jiri@...nulli.us>
To: Simon Horman <simon.horman@...ronome.com>
Cc: Tom Herbert <tom@...bertland.com>,
David Miller <davem@...emloft.net>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
Jay Vosburgh <j.vosburgh@...il.com>,
Veaceslav Falico <vfalico@...il.com>,
Andy Gospodarek <andy@...yhouse.net>,
Jamal Hadi Salim <jhs@...atatu.com>,
Jiri Pirko <jiri@...lanox.com>
Subject: Re: [PATCH v2 net-next 1/2] flow dissector: ICMP support
Tue, Dec 06, 2016 at 11:51:46AM CET, simon.horman@...ronome.com wrote:
>On Mon, Dec 05, 2016 at 09:29:45AM -0800, Tom Herbert wrote:
>> On Mon, Dec 5, 2016 at 6:23 AM, Simon Horman <simon.horman@...ronome.com> wrote:
>> > On Sat, Dec 03, 2016 at 10:52:32AM -0800, Tom Herbert wrote:
>> >> On Sat, Dec 3, 2016 at 2:49 AM, Jiri Pirko <jiri@...nulli.us> wrote:
>> >> > Fri, Dec 02, 2016 at 09:31:41PM CET, simon.horman@...ronome.com wrote:
>> >> >>Allow dissection of ICMP(V6) type and code. This re-uses transport layer
>> >> >>port dissection code as although ICMP is not a transport protocol and their
>> >> >>type and code are not ports this allows sharing of both code and storage.
>> >> >>
>> >> >>Signed-off-by: Simon Horman <simon.horman@...ronome.com>
>> >
>> > ...
>> >
>> >> > Digging into this a bit more. I think it would be much nice not to mix
>> >> > up l4 ports and icmp stuff.
>> >> >
>> >> > How about to have FLOW_DISSECTOR_KEY_ICMP
>> >> > and
>> >> > struct flow_dissector_key_icmp {
>> >> > u8 type;
>> >> > u8 code;
>> >> > };
>> >> >
>> >> > The you can make this structure and struct flow_dissector_key_ports into
>> >> > an union in struct flow_keys.
>> >> >
>> >> > Looks much cleaner to me.
>> >
>> > Hi Jiri,
>> >
>> > I wondered about that too. The main reason that I didn't do this the first
>> > time around is that I wasn't sure what to do to differentiate between ports
>> > and icmp in fl_init_dissector() given that using a union would could to
>> > mask bits being set in both if they are set in either.
>> >
>> > I've taken a stab at addressing that below along with implementing your
>> > suggestion. If the treatment in fl_init_dissector() is acceptable
>> > perhaps something similar should be done for FLOW_DISSECTOR_KEY_IPV[46]_ADDRS
>> > too?
>> >
>> >> I agree, this patch adds to many conditionals into the fast path for
>> >> ICMP handling. Neither is there much point in using type and code as
>> >> input to the packet hash.
>> >
>> > Hi Tom,
>> >
>> > I'm not sure that I follow this. A packet may be ICMP or not regardless of
>> > if FLOW_DISSECTOR_KEY_PORT (and now FLOW_DISSECTOR_KEY_ICMP) port is set or
>> > not. I'd appreciate some guidance here.
>> >
>> > First-pass at implementing Jiri's suggestion follows:
>> >
>
>...
>
>> > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>> > index 0584b4bb4390..33e5fa2b3e87 100644
>> > --- a/net/core/flow_dissector.c
>> > +++ b/net/core/flow_dissector.c
>
>...
>
>> > @@ -975,6 +983,10 @@ static const struct flow_dissector_key flow_keys_dissector_keys[] = {
>> > .offset = offsetof(struct flow_keys, ports),
>> > },
>> > {
>> > + .key_id = FLOW_DISSECTOR_KEY_ICMP,
>> > + .offset = offsetof(struct flow_keys, icmp),
>> > + },
>>
>> This is the problem I was referring to. This adds ICMP to the keys for
>> computing the skb hash and the ICMP type and code are in a union with
>> port numbers so they are in the range over which the hash is computed.
>> This means that we are treating type and code as some sort of flow and
>> and so different type/code between the same src/dst could follow
>> different paths in ECMP. For the purposes of computing a packet hash I
>> don't think we want ICMP information included. This might be a matter
>> of not putting FLOW_DISSECTOR_KEY_ICMP in flow_keys_dissector_keys,
>> where we need this information we could define another structure that
>> has all the same fields as in flow_keys_dissector_keys and include
>> FLOW_DISSECTOR_KEY_ICMP. Alternatively, the flow_dissector_key_icmp
>> could also be outside of the area that is used in the hash: that is no
>> in flow_keys_hash_start(keys) to flow_keys_hash_length(keys), keyval);
>
>...
>
>> > diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> > index c96b2a349779..f4ba69bd362f 100644
>> > --- a/net/sched/cls_flower.c
>> > +++ b/net/sched/cls_flower.c
>> > @@ -38,7 +38,10 @@ struct fl_flow_key {
>> > struct flow_dissector_key_ipv4_addrs ipv4;
>> > struct flow_dissector_key_ipv6_addrs ipv6;
>> > };
>> > - struct flow_dissector_key_ports tp;
>> > + union {
>> > + struct flow_dissector_key_ports tp;
>> > + struct flow_dissector_key_icmp icmp;
>> > + };
>>
>> When an ICMP packet is encapsulated within UDP then icmp overwrites
>> valid port information with is a stronger signal of the flow (unless
>> FLOW_DISSECTOR_F_STOP_AT_ENCAP is set). This is another reason not to
>> use a union with ports.
>
>...
>
>Hi Tom,
>
>thanks for your explanation. I think I have a clearer picture now.
>
>I have reworked things to try to address your concerns.
>In particular:
>
>* FLOW_DISSECTOR_KEY_ICMP is no longer added to flow_keys_dissector_keys.
> I don't think it is needed at all for the use-case I currently have in
> mind, which is classifying packets using tc_flower. And adding it there
> was an error on my part.
I was just about to ask why you are adding it there. Good.
>
>* I stopped using a union for ports and icmp. At the very least this seems
> to make it easier to reason about things as we no longer need to consider
> that a port value may in fact be an ICMP type or code.
This should be within csl_flower code. You can easily have it as a union
in struct fl_flow_key.
>
> This seems to allow us avoid adding a number of is_icmp checks (as I believe
> you pointed out earlier). And should also address the problem you state
> immediately above.
I don't understand the check you are talking about. The union just allow
to share the mem. I don't see any checks needed.
>
>* I have also placed icmp outside of the range flow_keys_hash_start(keys)
> to flow_keys_hash_length(keys), keyval). This is because I now see no
> value of having it inside that range and it both avoids any chance of
> contaminating the hash with ICMP values and hashing over unwanted
> (hopefully zero) values.
Okay, now I'm confused again. You don't want to add this to
flow_keys_dissector_keys. Why would you?
>
> This required an update to flow_keys_hash_length() as the bound
> of the end of fields the hash is no longer the end of struct flow_keys.
> It seems clean but I wonder if there are similar assumptions lurking
> elsewhere.
>
>I have lightly tested this for the tc_flower case (only).
>
>Incremental patch on top of this series:
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index a6f75cfb2bf7..8029dd4912b6 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -3181,8 +3181,7 @@ static bool bond_flow_dissect(struct bonding *bond, struct sk_buff *skb,
> } else {
> return false;
> }
>- if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 &&
>- proto >= 0 && !skb_flow_is_icmp_any(skb, proto))
>+ if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER34 && proto >= 0)
> fk->ports.ports = skb_flow_get_ports(skb, noff, proto);
>
> return true;
>@@ -3210,8 +3209,7 @@ u32 bond_xmit_hash(struct bonding *bond, struct sk_buff *skb)
> return bond_eth_hash(skb);
>
> if (bond->params.xmit_policy == BOND_XMIT_POLICY_LAYER23 ||
>- bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23 ||
>- flow_keys_are_icmp_any(&flow))
>+ bond->params.xmit_policy == BOND_XMIT_POLICY_ENCAP23)
> hash = bond_eth_hash(skb);
> else
> hash = (__force u32)flow.ports.ports;
>diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>index 44a8f69a9198..9c535fbccf2c 100644
>--- a/include/linux/skbuff.h
>+++ b/include/linux/skbuff.h
>@@ -1094,11 +1094,6 @@ u32 __skb_get_poff(const struct sk_buff *skb, void *data,
> __be32 __skb_flow_get_ports(const struct sk_buff *skb, int thoff, u8 ip_proto,
> void *data, int hlen_proto);
>
>-static inline bool skb_flow_is_icmp_any(const struct sk_buff *skb, u8 ip_proto)
>-{
>- return flow_protos_are_icmp_any(skb->protocol, ip_proto);
>-}
>-
> static inline __be32 skb_flow_get_ports(const struct sk_buff *skb,
> int thoff, u8 ip_proto)
> {
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index 5540dfa18872..058c9df8a4d8 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -91,14 +91,10 @@ struct flow_dissector_key_addrs {
>
> /**
> * flow_dissector_key_ports:
>- * @ports: port numbers of Transport header or
>- * type and code of ICMP header
>+ * @ports: port numbers of Transport header
> * ports: source (high) and destination (low) port numbers
> * src: source port number
> * dst: destination port number
>- * icmp: ICMP type (high) and code (low)
>- * type: ICMP type
>- * type: ICMP code
> */
> struct flow_dissector_key_ports {
> union {
>@@ -107,6 +103,18 @@ struct flow_dissector_key_ports {
> __be16 src;
> __be16 dst;
> };
>+ };
>+};
>+
>+/**
>+ * flow_dissector_key_icmp:
>+ * @ports: type and code of ICMP header
>+ * icmp: ICMP type (high) and code (low)
>+ * type: ICMP type
>+ * code: ICMP code
>+ */
>+struct flow_dissector_key_icmp {
>+ union {
> __be16 icmp;
> struct {
> u8 type;
>@@ -115,7 +123,6 @@ struct flow_dissector_key_ports {
> };
> };
>
>-
> /**
> * struct flow_dissector_key_eth_addrs:
> * @src: source Ethernet address
>@@ -133,6 +140,7 @@ enum flow_dissector_key_id {
> FLOW_DISSECTOR_KEY_IPV4_ADDRS, /* struct flow_dissector_key_ipv4_addrs */
> FLOW_DISSECTOR_KEY_IPV6_ADDRS, /* struct flow_dissector_key_ipv6_addrs */
> FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
>+ FLOW_DISSECTOR_KEY_ICMP, /* struct flow_dissector_key_icmp */
> FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
> FLOW_DISSECTOR_KEY_TIPC_ADDRS, /* struct flow_dissector_key_tipc_addrs */
> FLOW_DISSECTOR_KEY_VLAN, /* struct flow_dissector_key_flow_vlan */
>@@ -173,11 +181,16 @@ struct flow_keys {
> struct flow_dissector_key_keyid keyid;
> struct flow_dissector_key_ports ports;
> struct flow_dissector_key_addrs addrs;
>+#define FLOW_KEYS_HASH_END_FIELD addrs
>+ struct flow_dissector_key_icmp icmp;
> };
>
> #define FLOW_KEYS_HASH_OFFSET \
> offsetof(struct flow_keys, FLOW_KEYS_HASH_START_FIELD)
>
>+#define FLOW_KEYS_HASH_END \
>+ offsetofend(struct flow_keys, FLOW_KEYS_HASH_END_FIELD)
>+
> __be32 flow_get_u32_src(const struct flow_keys *flow);
> __be32 flow_get_u32_dst(const struct flow_keys *flow);
>
>@@ -233,8 +246,7 @@ static inline bool flow_keys_are_icmp_any(const struct flow_keys *keys)
>
> static inline bool flow_keys_have_l4(const struct flow_keys *keys)
> {
>- return (!flow_keys_are_icmp_any(keys) && keys->ports.ports) ||
>- keys->tags.flow_label;
>+ return (keys->ports.ports || keys->tags.flow_label);
> }
>
> u32 flow_hash_from_keys(struct flow_keys *keys);
>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>index 0584b4bb4390..ed6d46475343 100644
>--- a/net/core/flow_dissector.c
>+++ b/net/core/flow_dissector.c
>@@ -139,6 +139,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> struct flow_dissector_key_basic *key_basic;
> struct flow_dissector_key_addrs *key_addrs;
> struct flow_dissector_key_ports *key_ports;
>+ struct flow_dissector_key_icmp *key_icmp;
> struct flow_dissector_key_tags *key_tags;
> struct flow_dissector_key_vlan *key_vlan;
> struct flow_dissector_key_keyid *key_keyid;
>@@ -559,18 +560,25 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> break;
> }
>
>- if (dissector_uses_key(flow_dissector,
>- FLOW_DISSECTOR_KEY_PORTS)) {
>- key_ports = skb_flow_dissector_target(flow_dissector,
>- FLOW_DISSECTOR_KEY_PORTS,
>- target_container);
>- if (flow_protos_are_icmp_any(proto, ip_proto))
>- key_ports->icmp = skb_flow_get_be16(skb, nhoff, data,
>- hlen);
>- else
>+ if (flow_protos_are_icmp_any(proto, ip_proto)) {
>+ if (dissector_uses_key(flow_dissector,
>+ FLOW_DISSECTOR_KEY_ICMP)) {
>+ key_icmp = skb_flow_dissector_target(flow_dissector,
>+ FLOW_DISSECTOR_KEY_ICMP,
>+ target_container);
>+ key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data,
>+ hlen);
>+ }
>+ } else {
>+ if (dissector_uses_key(flow_dissector,
>+ FLOW_DISSECTOR_KEY_PORTS)) {
>+ key_ports = skb_flow_dissector_target(flow_dissector,
>+ FLOW_DISSECTOR_KEY_PORTS,
>+ target_container);
> key_ports->ports = __skb_flow_get_ports(skb, nhoff,
> ip_proto, data,
> hlen);
>+ }
> }
>
> out_good:
>@@ -613,9 +621,10 @@ static inline const u32 *flow_keys_hash_start(const struct flow_keys *flow)
> static inline size_t flow_keys_hash_length(const struct flow_keys *flow)
> {
> size_t diff = FLOW_KEYS_HASH_OFFSET + sizeof(flow->addrs);
>- BUILD_BUG_ON((sizeof(*flow) - FLOW_KEYS_HASH_OFFSET) % sizeof(u32));
>+ BUILD_BUG_ON((FLOW_KEYS_HASH_END - FLOW_KEYS_HASH_OFFSET) %
>+ sizeof(u32));
> BUILD_BUG_ON(offsetof(typeof(*flow), addrs) !=
>- sizeof(*flow) - sizeof(flow->addrs));
>+ FLOW_KEYS_HASH_END - sizeof(flow->addrs));
>
> switch (flow->control.addr_type) {
> case FLOW_DISSECTOR_KEY_IPV4_ADDRS:
>@@ -628,7 +637,7 @@ static inline size_t flow_keys_hash_length(const struct flow_keys *flow)
> diff -= sizeof(flow->addrs.tipcaddrs);
> break;
> }
>- return (sizeof(*flow) - diff) / sizeof(u32);
>+ return (FLOW_KEYS_HASH_END - diff) / sizeof(u32);
> }
>
> __be32 flow_get_u32_src(const struct flow_keys *flow)
>@@ -745,8 +754,7 @@ void make_flow_keys_digest(struct flow_keys_digest *digest,
>
> data->n_proto = flow->basic.n_proto;
> data->ip_proto = flow->basic.ip_proto;
>- if (flow_keys_have_l4(flow))
>- data->ports = flow->ports.ports;
>+ data->ports = flow->ports.ports;
> data->src = flow->addrs.v4addrs.src;
> data->dst = flow->addrs.v4addrs.dst;
> }
>diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
>index 408313e33bbe..6575aba87630 100644
>--- a/net/sched/cls_flow.c
>+++ b/net/sched/cls_flow.c
>@@ -96,7 +96,7 @@ static u32 flow_get_proto(const struct sk_buff *skb,
> static u32 flow_get_proto_src(const struct sk_buff *skb,
> const struct flow_keys *flow)
> {
>- if (!flow_keys_are_icmp_any(flow) && flow->ports.ports)
>+ if (flow->ports.ports)
> return ntohs(flow->ports.src);
>
> return addr_fold(skb->sk);
>@@ -105,7 +105,7 @@ static u32 flow_get_proto_src(const struct sk_buff *skb,
> static u32 flow_get_proto_dst(const struct sk_buff *skb,
> const struct flow_keys *flow)
> {
>- if (!flow_keys_are_icmp_any(flow) && flow->ports.ports)
>+ if (flow->ports.ports)
> return ntohs(flow->ports.dst);
>
> return addr_fold(skb_dst(skb)) ^ (__force u16) tc_skb_protocol(skb);
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index c96b2a349779..56df0368125a 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -39,6 +39,7 @@ struct fl_flow_key {
> struct flow_dissector_key_ipv6_addrs ipv6;
> };
> struct flow_dissector_key_ports tp;
>+ struct flow_dissector_key_icmp icmp;
> struct flow_dissector_key_keyid enc_key_id;
> union {
> struct flow_dissector_key_ipv4_addrs enc_ipv4;
>@@ -511,19 +512,23 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
> &mask->tp.dst, TCA_FLOWER_KEY_SCTP_DST_MASK,
> sizeof(key->tp.dst));
> } else if (flow_basic_key_is_icmpv4(&key->basic)) {
>- fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE,
>- &mask->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
>- sizeof(key->tp.type));
>- fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>- &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>- sizeof(key->tp.code));
>+ fl_set_key_val(tb, &key->icmp.type, TCA_FLOWER_KEY_ICMPV4_TYPE,
>+ &mask->icmp.type,
>+ TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
>+ sizeof(key->icmp.type));
>+ fl_set_key_val(tb, &key->icmp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>+ &mask->icmp.code,
>+ TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>+ sizeof(key->icmp.code));
> } else if (flow_basic_key_is_icmpv6(&key->basic)) {
>- fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE,
>- &mask->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
>- sizeof(key->tp.type));
>- fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>- &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>- sizeof(key->tp.code));
>+ fl_set_key_val(tb, &key->icmp.type, TCA_FLOWER_KEY_ICMPV6_TYPE,
>+ &mask->icmp.type,
>+ TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
>+ sizeof(key->icmp.type));
>+ fl_set_key_val(tb, &key->icmp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>+ &mask->icmp.code,
>+ TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>+ sizeof(key->icmp.code));
> }
>
> if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
>@@ -634,6 +639,8 @@ static void fl_init_dissector(struct cls_fl_head *head,
> FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
> FLOW_DISSECTOR_KEY_PORTS, tp);
> FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
>+ FLOW_DISSECTOR_KEY_ICMP, icmp);
>+ FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
> FLOW_DISSECTOR_KEY_VLAN, vlan);
> FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
> FLOW_DISSECTOR_KEY_ENC_KEYID, enc_key_id);
>@@ -1000,24 +1007,24 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
> sizeof(key->tp.dst))))
> goto nla_put_failure;
> else if (flow_basic_key_is_icmpv4(&key->basic) &&
>- (fl_dump_key_val(skb, &key->tp.type,
>- TCA_FLOWER_KEY_ICMPV4_TYPE, &mask->tp.type,
>+ (fl_dump_key_val(skb, &key->icmp.type,
>+ TCA_FLOWER_KEY_ICMPV4_TYPE, &mask->icmp.type,
> TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
>- sizeof(key->tp.type)) ||
>- fl_dump_key_val(skb, &key->tp.code,
>- TCA_FLOWER_KEY_ICMPV4_CODE, &mask->tp.code,
>+ sizeof(key->icmp.type)) ||
>+ fl_dump_key_val(skb, &key->icmp.code,
>+ TCA_FLOWER_KEY_ICMPV4_CODE, &mask->icmp.code,
> TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>- sizeof(key->tp.code))))
>+ sizeof(key->icmp.code))))
> goto nla_put_failure;
> else if (flow_basic_key_is_icmpv6(&key->basic) &&
>- (fl_dump_key_val(skb, &key->tp.type,
>- TCA_FLOWER_KEY_ICMPV6_TYPE, &mask->tp.type,
>+ (fl_dump_key_val(skb, &key->icmp.type,
>+ TCA_FLOWER_KEY_ICMPV6_TYPE, &mask->icmp.type,
> TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
>- sizeof(key->tp.type)) ||
>- fl_dump_key_val(skb, &key->tp.code,
>- TCA_FLOWER_KEY_ICMPV6_CODE, &mask->tp.code,
>+ sizeof(key->icmp.type)) ||
>+ fl_dump_key_val(skb, &key->icmp.code,
>+ TCA_FLOWER_KEY_ICMPV6_CODE, &mask->icmp.code,
> TCA_FLOWER_KEY_ICMPV6_CODE_MASK,
>- sizeof(key->tp.code))))
>+ sizeof(key->icmp.code))))
> goto nla_put_failure;
>
> if (key->enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS &&
Powered by blists - more mailing lists