netdev - Re: [net-next PATCH v1 02/11] net: flow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 5 Jan 2015 22:19:57 -0800
From:	Scott Feldman <sfeldma@...il.com>
To:	John Fastabend <john.fastabend@...il.com>
Cc:	Thomas Graf <tgraf@...g.ch>,
	Jiří Pírko <jiri@...nulli.us>,
	Jamal Hadi Salim <jhs@...atatu.com>,
	"simon.horman@...ronome.com" <simon.horman@...ronome.com>,
	Netdev <netdev@...r.kernel.org>,
	"David S. Miller" <davem@...emloft.net>,
	Andy Gospodarek <andy@...yhouse.net>
Subject: Re: [net-next PATCH v1 02/11] net: flow_table: add flow, delete flow

On Wed, Dec 31, 2014 at 11:46 AM, John Fastabend
<john.fastabend@...il.com> wrote:
> Now that the device capabilities are exposed we can add support to
> add and delete flows from the tables.
>
> The two operations are
>
> table_set_flows :
>
>   The set flow operations is used to program a set of flows into a
>   hardware device table. The message is consumed via netlink encoded
>   message which is then decoded into a null terminated  array of
>   flow entry structures. A flow entry structure is defined as
>
>      struct net_flow_flow {
>                           int table_id;
>                           int uid;
>                           int priority;
>                           struct net_flow_field_ref *matches;
>                           struct net_flow_action *actions;
>      }
>
>   The table id is the _uid_ returned from 'get_tables' operatoins.
>   Matches is a set of match criteria for packets with a logical AND
>   operation done on the set so packets match the entire criteria.
>   Actions provide a set of actions to perform when the flow rule is
>   hit. Both matches and actions are null terminated arrays.
>
>   The flows are configured in hardware using an ndo op. We do not
>   provide a commit operation at the moment and expect hardware
>   commits the flows one at a time. Future work may require a commit
>   operation to tell the hardware we are done loading flow rules. On
>   some hardware this will help bulk updates.
>
>   Its possible for hardware to return an error from a flow set
>   operation. This can occur for many reasons both transient and
>   resource constraints. We have different error handling strategies
>   built in and listed here,
>
>     *_ERROR_ABORT      abort on first error with errmsg
>
>     *_ERROR_CONTINUE   continue programming flows no errmsg
>
>     *_ERROR_ABORT_LOG  abort on first error and return flow that
>                        failed to user space in reply msg
>
>     *_ERROR_CONT_LOG   continue programming flows and return a list
>                        of flows that failed to user space in a reply
>                        msg.
>
>   notably missing is a rollback error strategy. I don't have a
>   use for this in software yet but the strategy can be added with
>   *_ERROR_ROLLBACK for example.
>
> table_del_flows
>
>   The delete flow operation uses the same structures and error
>   handling strategies as the table_set_flows operations. Although on
>   delete messges ommit the matches/actions arrays because they are
>   not needed to lookup the flow.
>
> Also thanks to Simon Horman for fixes and other help.
>
> Signed-off-by: John Fastabend <john.r.fastabend@...el.com>
> ---
>  include/linux/if_flow.h      |   21 ++
>  include/linux/netdevice.h    |    8 +
>  include/uapi/linux/if_flow.h |   49 ++++
>  net/core/flow_table.c        |  501 ++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 579 insertions(+)
>
> diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h
> index 1b6c1ea..20fa752 100644
> --- a/include/linux/if_flow.h
> +++ b/include/linux/if_flow.h
> @@ -90,4 +90,25 @@ struct net_flow_tbl_node {
>         __u32 flags;
>         struct net_flow_jump_table *jump;
>  };
> +
> +/**
> + * @struct net_flow_flow
> + * @brief describes the match/action entry
> + *
> + * @uid unique identifier for flow
> + * @priority priority to execute flow match/action in table

What is the convention on priority?  0 is lowest priority or highest?

> + * @match null terminated set of match uids match criteria
> + * @actoin null terminated set of action uids to apply to match
> + *
> + * Flows must match all entries in match set.
> + */
> +struct net_flow_flow {
> +       int table_id;
> +       int uid;
> +       int priority;
> +       struct net_flow_field_ref *matches;
> +       struct net_flow_action *actions;
> +};
> +
> +int net_flow_put_flow(struct sk_buff *skb, struct net_flow_flow *flow);
>  #endif
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 3c3c856..be8d4e4 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1197,6 +1197,14 @@ struct net_device_ops {
>         struct net_flow_header  **(*ndo_flow_get_headers)(struct net_device *dev);
>         struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct net_device *dev);
>         struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct net_device *dev);
> +       int                     (*ndo_flow_get_flows)(struct sk_buff *skb,
> +                                                     struct net_device *dev,
> +                                                     int table,
> +                                                     int min, int max);
> +       int                     (*ndo_flow_set_flows)(struct net_device *dev,
> +                                                     struct net_flow_flow *f);
> +       int                     (*ndo_flow_del_flows)(struct net_device *dev,
> +                                                     struct net_flow_flow *f);

Need doc for these in BIG comment block above this struct.  Same for
ndo_flow_xxx added in previous patch.

>  #endif
>  };
>
> diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h
> index 2acdb38..125cdc6 100644
> --- a/include/uapi/linux/if_flow.h
> +++ b/include/uapi/linux/if_flow.h
> @@ -329,6 +329,48 @@ enum {
>  #define NET_FLOW_TABLE_GRAPH_MAX (__NET_FLOW_TABLE_GRAPH_MAX - 1)
>
>  enum {
> +       NET_FLOW_NET_FLOW_UNSPEC,
> +       NET_FLOW_FLOW,
> +       __NET_FLOW_NET_FLOW_MAX,
> +};
> +#define NET_FLOW_NET_FLOW_MAX (__NET_FLOW_NET_FLOW_MAX - 1)
> +
> +enum {
> +       NET_FLOW_TABLE_FLOWS_UNSPEC,
> +       NET_FLOW_TABLE_FLOWS_TABLE,
> +       NET_FLOW_TABLE_FLOWS_MINPRIO,
> +       NET_FLOW_TABLE_FLOWS_MAXPRIO,
> +       NET_FLOW_TABLE_FLOWS_FLOWS,
> +       __NET_FLOW_TABLE_FLOWS_MAX,
> +};
> +#define NET_FLOW_TABLE_FLOWS_MAX (__NET_FLOW_TABLE_FLOWS_MAX - 1)
> +
> +enum {

NET_FLOW_FLOWS_ERROR_UNSPEC?

> +       /* Abort with normal errmsg */
> +       NET_FLOW_FLOWS_ERROR_ABORT,
> +       /* Ignore errors and continue without logging */
> +       NET_FLOW_FLOWS_ERROR_CONTINUE,
> +       /* Abort and reply with invalid flow fields */
> +       NET_FLOW_FLOWS_ERROR_ABORT_LOG,
> +       /* Continue and reply with list of invalid flows */
> +       NET_FLOW_FLOWS_ERROR_CONT_LOG,
> +       __NET_FLOWS_FLOWS_ERROR_MAX,
> +};
> +#define NET_FLOWS_FLOWS_ERROR_MAX (__NET_FLOWS_FLOWS_ERROR_MAX - 1)
> +
> +enum {
> +       NET_FLOW_ATTR_UNSPEC,
> +       NET_FLOW_ATTR_ERROR,
> +       NET_FLOW_ATTR_TABLE,
> +       NET_FLOW_ATTR_UID,
> +       NET_FLOW_ATTR_PRIORITY,
> +       NET_FLOW_ATTR_MATCHES,
> +       NET_FLOW_ATTR_ACTIONS,
> +       __NET_FLOW_ATTR_MAX,
> +};
> +#define NET_FLOW_ATTR_MAX (__NET_FLOW_ATTR_MAX - 1)
> +
> +enum {
>         NET_FLOW_IDENTIFIER_IFINDEX, /* net_device ifindex */
>  };
>
> @@ -343,6 +385,9 @@ enum {
>         NET_FLOW_HEADER_GRAPH,
>         NET_FLOW_TABLE_GRAPH,
>
> +       NET_FLOW_FLOWS,
> +       NET_FLOW_FLOWS_ERROR,
> +
>         __NET_FLOW_MAX,
>         NET_FLOW_MAX = (__NET_FLOW_MAX - 1),
>  };
> @@ -354,6 +399,10 @@ enum {
>         NET_FLOW_TABLE_CMD_GET_HDR_GRAPH,
>         NET_FLOW_TABLE_CMD_GET_TABLE_GRAPH,
>
> +       NET_FLOW_TABLE_CMD_GET_FLOWS,
> +       NET_FLOW_TABLE_CMD_SET_FLOWS,
> +       NET_FLOW_TABLE_CMD_DEL_FLOWS,
> +
>         __NET_FLOW_CMD_MAX,
>         NET_FLOW_CMD_MAX = (__NET_FLOW_CMD_MAX - 1),
>  };
> diff --git a/net/core/flow_table.c b/net/core/flow_table.c
> index ec3f06d..f4cf293 100644
> --- a/net/core/flow_table.c
> +++ b/net/core/flow_table.c
> @@ -774,6 +774,489 @@ static int net_flow_cmd_get_table_graph(struct sk_buff *skb,
>         return genlmsg_reply(msg, info);
>  }
>
> +static struct sk_buff *net_flow_build_flows_msg(struct net_device *dev,
> +                                               u32 portid, int seq, u8 cmd,
> +                                               int min, int max, int table)
> +{
> +       struct genlmsghdr *hdr;
> +       struct nlattr *flows;
> +       struct sk_buff *skb;
> +       int err = -ENOBUFS;
> +
> +       skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);

Does netlink msg size limit the number of flows we can return?  If I
have 10K unique prefix routes, resulting in 10K flows in my L3 table,
all at same priority, can it be dumped?

I'm wondering (out loud) if get_flows is something that should be
supported at the driver level, or should it be managed above the
driver.  In other words, whoever calls set_flows and del_flows knows
what's what and could return get_flows rather than calling down to
driver/device to get_flows.  Unless driver/device could create flows
independent of set_flows and del_flows.

> +       if (!skb)
> +               return ERR_PTR(-ENOBUFS);
> +
> +       hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
> +       if (!hdr)
> +               goto out;
> +
> +       if (nla_put_u32(skb,
> +                       NET_FLOW_IDENTIFIER_TYPE,
> +                       NET_FLOW_IDENTIFIER_IFINDEX) ||
> +           nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) {
> +               err = -ENOBUFS;
> +               goto out;
> +       }
> +
> +       flows = nla_nest_start(skb, NET_FLOW_FLOWS);
> +       if (!flows) {
> +               err = -EMSGSIZE;
> +               goto out;
> +       }
> +
> +       err = dev->netdev_ops->ndo_flow_get_flows(skb, dev, table, min, max);
> +       if (err < 0)
> +               goto out_cancel;
> +
> +       nla_nest_end(skb, flows);
> +
> +       err = genlmsg_end(skb, hdr);
> +       if (err < 0)
> +               goto out;
> +
> +       return skb;
> +out_cancel:
> +       nla_nest_cancel(skb, flows);
> +out:
> +       nlmsg_free(skb);
> +       return ERR_PTR(err);
> +}
> +
> +static const
> +struct nla_policy net_flow_table_flows_policy[NET_FLOW_TABLE_FLOWS_MAX + 1] = {
> +       [NET_FLOW_TABLE_FLOWS_TABLE]   = { .type = NLA_U32,},
> +       [NET_FLOW_TABLE_FLOWS_MINPRIO] = { .type = NLA_U32,},
> +       [NET_FLOW_TABLE_FLOWS_MAXPRIO] = { .type = NLA_U32,},
> +       [NET_FLOW_TABLE_FLOWS_FLOWS]   = { .type = NLA_NESTED,},
> +};
> +
> +static int net_flow_table_cmd_get_flows(struct sk_buff *skb,
> +                                       struct genl_info *info)
> +{
> +       struct nlattr *tb[NET_FLOW_TABLE_FLOWS_MAX+1];
> +       int table, min = -1, max = -1;
> +       struct net_device *dev;
> +       struct sk_buff *msg;
> +       int err = -EINVAL;
> +
> +       dev = net_flow_get_dev(info);
> +       if (!dev)
> +               return -EINVAL;
> +
> +       if (!dev->netdev_ops->ndo_flow_get_flows) {
> +               dev_put(dev);
> +               return -EOPNOTSUPP;
> +       }
> +
> +       if (!info->attrs[NET_FLOW_IDENTIFIER_TYPE] ||
> +           !info->attrs[NET_FLOW_IDENTIFIER] ||
> +           !info->attrs[NET_FLOW_FLOWS])
> +               goto out;
> +
> +       err = nla_parse_nested(tb, NET_FLOW_TABLE_FLOWS_MAX,
> +                              info->attrs[NET_FLOW_FLOWS],
> +                              net_flow_table_flows_policy);
> +       if (err)
> +               goto out;
> +
> +       if (!tb[NET_FLOW_TABLE_FLOWS_TABLE])
> +               goto out;
> +
> +       table = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_TABLE]);
> +
> +       if (tb[NET_FLOW_TABLE_FLOWS_MINPRIO])
> +               min = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_MINPRIO]);
> +       if (tb[NET_FLOW_TABLE_FLOWS_MAXPRIO])
> +               max = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_MAXPRIO]);

Just curious, what is the intended use of min/max prio?  Was it to
reduce number of flows returned in one get_flows call?


> +       msg = net_flow_build_flows_msg(dev,
> +                                      info->snd_portid,
> +                                      info->snd_seq,
> +                                      NET_FLOW_TABLE_CMD_GET_FLOWS,
> +                                      min, max, table);
> +       dev_put(dev);
> +
> +       if (IS_ERR(msg))
> +               return PTR_ERR(msg);
> +
> +       return genlmsg_reply(msg, info);
> +out:
> +       dev_put(dev);
> +       return err;
> +}
> +
> +static struct sk_buff *net_flow_start_errmsg(struct net_device *dev,
> +                                            struct genlmsghdr **hdr,
> +                                            u32 portid, int seq, u8 cmd)
> +{
> +       struct genlmsghdr *h;
> +       struct sk_buff *skb;
> +
> +       skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +       if (!skb)
> +               return ERR_PTR(-EMSGSIZE);
> +
> +       h = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd);
> +       if (!h)
> +               return ERR_PTR(-EMSGSIZE);
> +
> +       if (nla_put_u32(skb,
> +                       NET_FLOW_IDENTIFIER_TYPE,
> +                       NET_FLOW_IDENTIFIER_IFINDEX) ||
> +           nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex))
> +               return ERR_PTR(-EMSGSIZE);
> +
> +       *hdr = h;
> +       return skb;
> +}
> +
> +static struct sk_buff *net_flow_end_flow_errmsg(struct sk_buff *skb,
> +                                               struct genlmsghdr *hdr)
> +{
> +       int err;
> +
> +       err = genlmsg_end(skb, hdr);
> +       if (err < 0) {
> +               nlmsg_free(skb);
> +               return ERR_PTR(err);
> +       }
> +
> +       return skb;
> +}
> +
> +static int net_flow_put_flow_action(struct sk_buff *skb,
> +                                   struct net_flow_action *a)
> +{
> +       struct nlattr *action, *sigs;
> +       int i, err = 0;
> +
> +       action = nla_nest_start(skb, NET_FLOW_ACTION);
> +       if (!action)
> +               return -EMSGSIZE;
> +
> +       if (nla_put_u32(skb, NET_FLOW_ACTION_ATTR_UID, a->uid))
> +               return -EMSGSIZE;
> +
> +       if (!a->args)
> +               goto done;
> +
> +       for (i = 0; a->args[i].type; i++) {
> +               sigs = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE);
> +               if (!sigs) {
> +                       nla_nest_cancel(skb, action);
> +                       return -EMSGSIZE;
> +               }
> +
> +               err = net_flow_put_act_types(skb, a[i].args);
> +               if (err) {
> +                       nla_nest_cancel(skb, action);
> +                       nla_nest_cancel(skb, sigs);

order seems backwards.  i think you can just cancel outer.

> +                       return err;
> +               }
> +               nla_nest_end(skb, sigs);
> +       }
> +
> +done:
> +       nla_nest_end(skb, action);
> +       return 0;
> +}
> +
> +int net_flow_put_flow(struct sk_buff *skb, struct net_flow_flow *flow)
> +{
> +       struct nlattr *flows, *matches;
> +       struct nlattr *actions = NULL; /* must be null to unwind */
> +       int err, j, i = 0;
> +
> +       flows = nla_nest_start(skb, NET_FLOW_FLOW);
> +       if (!flows)
> +               goto put_failure;
> +
> +       if (nla_put_u32(skb, NET_FLOW_ATTR_TABLE, flow->table_id) ||
> +           nla_put_u32(skb, NET_FLOW_ATTR_UID, flow->uid) ||
> +           nla_put_u32(skb, NET_FLOW_ATTR_PRIORITY, flow->priority))
> +               goto flows_put_failure;
> +
> +       if (flow->matches) {
> +               matches = nla_nest_start(skb, NET_FLOW_ATTR_MATCHES);
> +               if (!matches)
> +                       goto flows_put_failure;
> +
> +               for (j = 0; flow->matches && flow->matches[j].header; j++) {

for(match = flow->matches; match->header; match++)

> +                       struct net_flow_field_ref *f = &flow->matches[j];
> +
> +                       if (!f->header)
> +                               continue;

Already checked in for loop?

> +
> +                       nla_put(skb, NET_FLOW_FIELD_REF, sizeof(*f), f);

no err check

> +               }
> +               nla_nest_end(skb, matches);
> +       }
> +
> +       if (flow->actions) {
> +               actions = nla_nest_start(skb, NET_FLOW_ATTR_ACTIONS);
> +               if (!actions)
> +                       goto flows_put_failure;
> +
> +               for (i = 0; flow->actions && flow->actions[i].uid; i++) {
> +                       err = net_flow_put_flow_action(skb, &flow->actions[i]);
> +                       if (err) {
> +                               nla_nest_cancel(skb, actions);

just cancel outer (I think)

> +                               goto flows_put_failure;
> +                       }
> +               }
> +               nla_nest_end(skb, actions);
> +       }
> +
> +       nla_nest_end(skb, flows);
> +       return 0;
> +
> +flows_put_failure:
> +       nla_nest_cancel(skb, flows);
> +put_failure:
> +       return -EMSGSIZE;
> +}
> +EXPORT_SYMBOL(net_flow_put_flow);
> +
> +static int net_flow_get_field(struct net_flow_field_ref *field,
> +                             struct nlattr *nla)
> +{
> +       if (nla_type(nla) != NET_FLOW_FIELD_REF)
> +               return -EINVAL;
> +
> +       if (nla_len(nla) < sizeof(*field))
> +               return -EINVAL;
> +
> +       *field = *(struct net_flow_field_ref *)nla_data(nla);
> +       return 0;
> +}

maybe return struct net_flow_field_ref * to simplify return logic.

> +
> +static int net_flow_get_action(struct net_flow_action *a, struct nlattr *attr)
> +{
> +       struct nlattr *act[NET_FLOW_ACTION_ATTR_MAX+1];
> +       struct nlattr *args;
> +       int rem;
> +       int err, count = 0;
> +
> +       if (nla_type(attr) != NET_FLOW_ACTION) {
> +               pr_warn("%s: expected NET_FLOW_ACTION\n", __func__);
> +               return 0;
> +       }
> +
> +       err = nla_parse_nested(act, NET_FLOW_ACTION_ATTR_MAX,
> +                              attr, net_flow_action_policy);
> +       if (err < 0)
> +               return err;
> +
> +       if (!act[NET_FLOW_ACTION_ATTR_UID] ||
> +           !act[NET_FLOW_ACTION_ATTR_SIGNATURE])
> +               return -EINVAL;
> +
> +       a->uid = nla_get_u32(act[NET_FLOW_ACTION_ATTR_UID]);
> +
> +       nla_for_each_nested(args, act[NET_FLOW_ACTION_ATTR_SIGNATURE], rem)
> +               count++; /* unoptimized max possible */
> +
> +       a->args = kcalloc(count + 1,
> +                         sizeof(struct net_flow_action_arg),
> +                         GFP_KERNEL);

kcalloc failure?

> +       count = 0;
> +
> +       nla_for_each_nested(args, act[NET_FLOW_ACTION_ATTR_SIGNATURE], rem) {
> +               if (nla_type(args) != NET_FLOW_ACTION_ARG)
> +                       continue;
> +
> +               if (nla_len(args) < sizeof(struct net_flow_action_arg)) {
> +                       kfree(a->args);
> +                       return -EINVAL;
> +               }
> +
> +               a->args[count] = *(struct net_flow_action_arg *)nla_data(args);

count++?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html