lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 5 Jan 2015 22:19:57 -0800 From: Scott Feldman <sfeldma@...il.com> To: John Fastabend <john.fastabend@...il.com> Cc: Thomas Graf <tgraf@...g.ch>, Jiří Pírko <jiri@...nulli.us>, Jamal Hadi Salim <jhs@...atatu.com>, "simon.horman@...ronome.com" <simon.horman@...ronome.com>, Netdev <netdev@...r.kernel.org>, "David S. Miller" <davem@...emloft.net>, Andy Gospodarek <andy@...yhouse.net> Subject: Re: [net-next PATCH v1 02/11] net: flow_table: add flow, delete flow On Wed, Dec 31, 2014 at 11:46 AM, John Fastabend <john.fastabend@...il.com> wrote: > Now that the device capabilities are exposed we can add support to > add and delete flows from the tables. > > The two operations are > > table_set_flows : > > The set flow operations is used to program a set of flows into a > hardware device table. The message is consumed via netlink encoded > message which is then decoded into a null terminated array of > flow entry structures. A flow entry structure is defined as > > struct net_flow_flow { > int table_id; > int uid; > int priority; > struct net_flow_field_ref *matches; > struct net_flow_action *actions; > } > > The table id is the _uid_ returned from 'get_tables' operatoins. > Matches is a set of match criteria for packets with a logical AND > operation done on the set so packets match the entire criteria. > Actions provide a set of actions to perform when the flow rule is > hit. Both matches and actions are null terminated arrays. > > The flows are configured in hardware using an ndo op. We do not > provide a commit operation at the moment and expect hardware > commits the flows one at a time. Future work may require a commit > operation to tell the hardware we are done loading flow rules. On > some hardware this will help bulk updates. > > Its possible for hardware to return an error from a flow set > operation. This can occur for many reasons both transient and > resource constraints. We have different error handling strategies > built in and listed here, > > *_ERROR_ABORT abort on first error with errmsg > > *_ERROR_CONTINUE continue programming flows no errmsg > > *_ERROR_ABORT_LOG abort on first error and return flow that > failed to user space in reply msg > > *_ERROR_CONT_LOG continue programming flows and return a list > of flows that failed to user space in a reply > msg. > > notably missing is a rollback error strategy. I don't have a > use for this in software yet but the strategy can be added with > *_ERROR_ROLLBACK for example. > > table_del_flows > > The delete flow operation uses the same structures and error > handling strategies as the table_set_flows operations. Although on > delete messges ommit the matches/actions arrays because they are > not needed to lookup the flow. > > Also thanks to Simon Horman for fixes and other help. > > Signed-off-by: John Fastabend <john.r.fastabend@...el.com> > --- > include/linux/if_flow.h | 21 ++ > include/linux/netdevice.h | 8 + > include/uapi/linux/if_flow.h | 49 ++++ > net/core/flow_table.c | 501 ++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 579 insertions(+) > > diff --git a/include/linux/if_flow.h b/include/linux/if_flow.h > index 1b6c1ea..20fa752 100644 > --- a/include/linux/if_flow.h > +++ b/include/linux/if_flow.h > @@ -90,4 +90,25 @@ struct net_flow_tbl_node { > __u32 flags; > struct net_flow_jump_table *jump; > }; > + > +/** > + * @struct net_flow_flow > + * @brief describes the match/action entry > + * > + * @uid unique identifier for flow > + * @priority priority to execute flow match/action in table What is the convention on priority? 0 is lowest priority or highest? > + * @match null terminated set of match uids match criteria > + * @actoin null terminated set of action uids to apply to match > + * > + * Flows must match all entries in match set. > + */ > +struct net_flow_flow { > + int table_id; > + int uid; > + int priority; > + struct net_flow_field_ref *matches; > + struct net_flow_action *actions; > +}; > + > +int net_flow_put_flow(struct sk_buff *skb, struct net_flow_flow *flow); > #endif > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 3c3c856..be8d4e4 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1197,6 +1197,14 @@ struct net_device_ops { > struct net_flow_header **(*ndo_flow_get_headers)(struct net_device *dev); > struct net_flow_hdr_node **(*ndo_flow_get_hdr_graph)(struct net_device *dev); > struct net_flow_tbl_node **(*ndo_flow_get_tbl_graph)(struct net_device *dev); > + int (*ndo_flow_get_flows)(struct sk_buff *skb, > + struct net_device *dev, > + int table, > + int min, int max); > + int (*ndo_flow_set_flows)(struct net_device *dev, > + struct net_flow_flow *f); > + int (*ndo_flow_del_flows)(struct net_device *dev, > + struct net_flow_flow *f); Need doc for these in BIG comment block above this struct. Same for ndo_flow_xxx added in previous patch. > #endif > }; > > diff --git a/include/uapi/linux/if_flow.h b/include/uapi/linux/if_flow.h > index 2acdb38..125cdc6 100644 > --- a/include/uapi/linux/if_flow.h > +++ b/include/uapi/linux/if_flow.h > @@ -329,6 +329,48 @@ enum { > #define NET_FLOW_TABLE_GRAPH_MAX (__NET_FLOW_TABLE_GRAPH_MAX - 1) > > enum { > + NET_FLOW_NET_FLOW_UNSPEC, > + NET_FLOW_FLOW, > + __NET_FLOW_NET_FLOW_MAX, > +}; > +#define NET_FLOW_NET_FLOW_MAX (__NET_FLOW_NET_FLOW_MAX - 1) > + > +enum { > + NET_FLOW_TABLE_FLOWS_UNSPEC, > + NET_FLOW_TABLE_FLOWS_TABLE, > + NET_FLOW_TABLE_FLOWS_MINPRIO, > + NET_FLOW_TABLE_FLOWS_MAXPRIO, > + NET_FLOW_TABLE_FLOWS_FLOWS, > + __NET_FLOW_TABLE_FLOWS_MAX, > +}; > +#define NET_FLOW_TABLE_FLOWS_MAX (__NET_FLOW_TABLE_FLOWS_MAX - 1) > + > +enum { NET_FLOW_FLOWS_ERROR_UNSPEC? > + /* Abort with normal errmsg */ > + NET_FLOW_FLOWS_ERROR_ABORT, > + /* Ignore errors and continue without logging */ > + NET_FLOW_FLOWS_ERROR_CONTINUE, > + /* Abort and reply with invalid flow fields */ > + NET_FLOW_FLOWS_ERROR_ABORT_LOG, > + /* Continue and reply with list of invalid flows */ > + NET_FLOW_FLOWS_ERROR_CONT_LOG, > + __NET_FLOWS_FLOWS_ERROR_MAX, > +}; > +#define NET_FLOWS_FLOWS_ERROR_MAX (__NET_FLOWS_FLOWS_ERROR_MAX - 1) > + > +enum { > + NET_FLOW_ATTR_UNSPEC, > + NET_FLOW_ATTR_ERROR, > + NET_FLOW_ATTR_TABLE, > + NET_FLOW_ATTR_UID, > + NET_FLOW_ATTR_PRIORITY, > + NET_FLOW_ATTR_MATCHES, > + NET_FLOW_ATTR_ACTIONS, > + __NET_FLOW_ATTR_MAX, > +}; > +#define NET_FLOW_ATTR_MAX (__NET_FLOW_ATTR_MAX - 1) > + > +enum { > NET_FLOW_IDENTIFIER_IFINDEX, /* net_device ifindex */ > }; > > @@ -343,6 +385,9 @@ enum { > NET_FLOW_HEADER_GRAPH, > NET_FLOW_TABLE_GRAPH, > > + NET_FLOW_FLOWS, > + NET_FLOW_FLOWS_ERROR, > + > __NET_FLOW_MAX, > NET_FLOW_MAX = (__NET_FLOW_MAX - 1), > }; > @@ -354,6 +399,10 @@ enum { > NET_FLOW_TABLE_CMD_GET_HDR_GRAPH, > NET_FLOW_TABLE_CMD_GET_TABLE_GRAPH, > > + NET_FLOW_TABLE_CMD_GET_FLOWS, > + NET_FLOW_TABLE_CMD_SET_FLOWS, > + NET_FLOW_TABLE_CMD_DEL_FLOWS, > + > __NET_FLOW_CMD_MAX, > NET_FLOW_CMD_MAX = (__NET_FLOW_CMD_MAX - 1), > }; > diff --git a/net/core/flow_table.c b/net/core/flow_table.c > index ec3f06d..f4cf293 100644 > --- a/net/core/flow_table.c > +++ b/net/core/flow_table.c > @@ -774,6 +774,489 @@ static int net_flow_cmd_get_table_graph(struct sk_buff *skb, > return genlmsg_reply(msg, info); > } > > +static struct sk_buff *net_flow_build_flows_msg(struct net_device *dev, > + u32 portid, int seq, u8 cmd, > + int min, int max, int table) > +{ > + struct genlmsghdr *hdr; > + struct nlattr *flows; > + struct sk_buff *skb; > + int err = -ENOBUFS; > + > + skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); Does netlink msg size limit the number of flows we can return? If I have 10K unique prefix routes, resulting in 10K flows in my L3 table, all at same priority, can it be dumped? I'm wondering (out loud) if get_flows is something that should be supported at the driver level, or should it be managed above the driver. In other words, whoever calls set_flows and del_flows knows what's what and could return get_flows rather than calling down to driver/device to get_flows. Unless driver/device could create flows independent of set_flows and del_flows. > + if (!skb) > + return ERR_PTR(-ENOBUFS); > + > + hdr = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd); > + if (!hdr) > + goto out; > + > + if (nla_put_u32(skb, > + NET_FLOW_IDENTIFIER_TYPE, > + NET_FLOW_IDENTIFIER_IFINDEX) || > + nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) { > + err = -ENOBUFS; > + goto out; > + } > + > + flows = nla_nest_start(skb, NET_FLOW_FLOWS); > + if (!flows) { > + err = -EMSGSIZE; > + goto out; > + } > + > + err = dev->netdev_ops->ndo_flow_get_flows(skb, dev, table, min, max); > + if (err < 0) > + goto out_cancel; > + > + nla_nest_end(skb, flows); > + > + err = genlmsg_end(skb, hdr); > + if (err < 0) > + goto out; > + > + return skb; > +out_cancel: > + nla_nest_cancel(skb, flows); > +out: > + nlmsg_free(skb); > + return ERR_PTR(err); > +} > + > +static const > +struct nla_policy net_flow_table_flows_policy[NET_FLOW_TABLE_FLOWS_MAX + 1] = { > + [NET_FLOW_TABLE_FLOWS_TABLE] = { .type = NLA_U32,}, > + [NET_FLOW_TABLE_FLOWS_MINPRIO] = { .type = NLA_U32,}, > + [NET_FLOW_TABLE_FLOWS_MAXPRIO] = { .type = NLA_U32,}, > + [NET_FLOW_TABLE_FLOWS_FLOWS] = { .type = NLA_NESTED,}, > +}; > + > +static int net_flow_table_cmd_get_flows(struct sk_buff *skb, > + struct genl_info *info) > +{ > + struct nlattr *tb[NET_FLOW_TABLE_FLOWS_MAX+1]; > + int table, min = -1, max = -1; > + struct net_device *dev; > + struct sk_buff *msg; > + int err = -EINVAL; > + > + dev = net_flow_get_dev(info); > + if (!dev) > + return -EINVAL; > + > + if (!dev->netdev_ops->ndo_flow_get_flows) { > + dev_put(dev); > + return -EOPNOTSUPP; > + } > + > + if (!info->attrs[NET_FLOW_IDENTIFIER_TYPE] || > + !info->attrs[NET_FLOW_IDENTIFIER] || > + !info->attrs[NET_FLOW_FLOWS]) > + goto out; > + > + err = nla_parse_nested(tb, NET_FLOW_TABLE_FLOWS_MAX, > + info->attrs[NET_FLOW_FLOWS], > + net_flow_table_flows_policy); > + if (err) > + goto out; > + > + if (!tb[NET_FLOW_TABLE_FLOWS_TABLE]) > + goto out; > + > + table = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_TABLE]); > + > + if (tb[NET_FLOW_TABLE_FLOWS_MINPRIO]) > + min = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_MINPRIO]); > + if (tb[NET_FLOW_TABLE_FLOWS_MAXPRIO]) > + max = nla_get_u32(tb[NET_FLOW_TABLE_FLOWS_MAXPRIO]); Just curious, what is the intended use of min/max prio? Was it to reduce number of flows returned in one get_flows call? > + msg = net_flow_build_flows_msg(dev, > + info->snd_portid, > + info->snd_seq, > + NET_FLOW_TABLE_CMD_GET_FLOWS, > + min, max, table); > + dev_put(dev); > + > + if (IS_ERR(msg)) > + return PTR_ERR(msg); > + > + return genlmsg_reply(msg, info); > +out: > + dev_put(dev); > + return err; > +} > + > +static struct sk_buff *net_flow_start_errmsg(struct net_device *dev, > + struct genlmsghdr **hdr, > + u32 portid, int seq, u8 cmd) > +{ > + struct genlmsghdr *h; > + struct sk_buff *skb; > + > + skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL); > + if (!skb) > + return ERR_PTR(-EMSGSIZE); > + > + h = genlmsg_put(skb, portid, seq, &net_flow_nl_family, 0, cmd); > + if (!h) > + return ERR_PTR(-EMSGSIZE); > + > + if (nla_put_u32(skb, > + NET_FLOW_IDENTIFIER_TYPE, > + NET_FLOW_IDENTIFIER_IFINDEX) || > + nla_put_u32(skb, NET_FLOW_IDENTIFIER, dev->ifindex)) > + return ERR_PTR(-EMSGSIZE); > + > + *hdr = h; > + return skb; > +} > + > +static struct sk_buff *net_flow_end_flow_errmsg(struct sk_buff *skb, > + struct genlmsghdr *hdr) > +{ > + int err; > + > + err = genlmsg_end(skb, hdr); > + if (err < 0) { > + nlmsg_free(skb); > + return ERR_PTR(err); > + } > + > + return skb; > +} > + > +static int net_flow_put_flow_action(struct sk_buff *skb, > + struct net_flow_action *a) > +{ > + struct nlattr *action, *sigs; > + int i, err = 0; > + > + action = nla_nest_start(skb, NET_FLOW_ACTION); > + if (!action) > + return -EMSGSIZE; > + > + if (nla_put_u32(skb, NET_FLOW_ACTION_ATTR_UID, a->uid)) > + return -EMSGSIZE; > + > + if (!a->args) > + goto done; > + > + for (i = 0; a->args[i].type; i++) { > + sigs = nla_nest_start(skb, NET_FLOW_ACTION_ATTR_SIGNATURE); > + if (!sigs) { > + nla_nest_cancel(skb, action); > + return -EMSGSIZE; > + } > + > + err = net_flow_put_act_types(skb, a[i].args); > + if (err) { > + nla_nest_cancel(skb, action); > + nla_nest_cancel(skb, sigs); order seems backwards. i think you can just cancel outer. > + return err; > + } > + nla_nest_end(skb, sigs); > + } > + > +done: > + nla_nest_end(skb, action); > + return 0; > +} > + > +int net_flow_put_flow(struct sk_buff *skb, struct net_flow_flow *flow) > +{ > + struct nlattr *flows, *matches; > + struct nlattr *actions = NULL; /* must be null to unwind */ > + int err, j, i = 0; > + > + flows = nla_nest_start(skb, NET_FLOW_FLOW); > + if (!flows) > + goto put_failure; > + > + if (nla_put_u32(skb, NET_FLOW_ATTR_TABLE, flow->table_id) || > + nla_put_u32(skb, NET_FLOW_ATTR_UID, flow->uid) || > + nla_put_u32(skb, NET_FLOW_ATTR_PRIORITY, flow->priority)) > + goto flows_put_failure; > + > + if (flow->matches) { > + matches = nla_nest_start(skb, NET_FLOW_ATTR_MATCHES); > + if (!matches) > + goto flows_put_failure; > + > + for (j = 0; flow->matches && flow->matches[j].header; j++) { for(match = flow->matches; match->header; match++) > + struct net_flow_field_ref *f = &flow->matches[j]; > + > + if (!f->header) > + continue; Already checked in for loop? > + > + nla_put(skb, NET_FLOW_FIELD_REF, sizeof(*f), f); no err check > + } > + nla_nest_end(skb, matches); > + } > + > + if (flow->actions) { > + actions = nla_nest_start(skb, NET_FLOW_ATTR_ACTIONS); > + if (!actions) > + goto flows_put_failure; > + > + for (i = 0; flow->actions && flow->actions[i].uid; i++) { > + err = net_flow_put_flow_action(skb, &flow->actions[i]); > + if (err) { > + nla_nest_cancel(skb, actions); just cancel outer (I think) > + goto flows_put_failure; > + } > + } > + nla_nest_end(skb, actions); > + } > + > + nla_nest_end(skb, flows); > + return 0; > + > +flows_put_failure: > + nla_nest_cancel(skb, flows); > +put_failure: > + return -EMSGSIZE; > +} > +EXPORT_SYMBOL(net_flow_put_flow); > + > +static int net_flow_get_field(struct net_flow_field_ref *field, > + struct nlattr *nla) > +{ > + if (nla_type(nla) != NET_FLOW_FIELD_REF) > + return -EINVAL; > + > + if (nla_len(nla) < sizeof(*field)) > + return -EINVAL; > + > + *field = *(struct net_flow_field_ref *)nla_data(nla); > + return 0; > +} maybe return struct net_flow_field_ref * to simplify return logic. > + > +static int net_flow_get_action(struct net_flow_action *a, struct nlattr *attr) > +{ > + struct nlattr *act[NET_FLOW_ACTION_ATTR_MAX+1]; > + struct nlattr *args; > + int rem; > + int err, count = 0; > + > + if (nla_type(attr) != NET_FLOW_ACTION) { > + pr_warn("%s: expected NET_FLOW_ACTION\n", __func__); > + return 0; > + } > + > + err = nla_parse_nested(act, NET_FLOW_ACTION_ATTR_MAX, > + attr, net_flow_action_policy); > + if (err < 0) > + return err; > + > + if (!act[NET_FLOW_ACTION_ATTR_UID] || > + !act[NET_FLOW_ACTION_ATTR_SIGNATURE]) > + return -EINVAL; > + > + a->uid = nla_get_u32(act[NET_FLOW_ACTION_ATTR_UID]); > + > + nla_for_each_nested(args, act[NET_FLOW_ACTION_ATTR_SIGNATURE], rem) > + count++; /* unoptimized max possible */ > + > + a->args = kcalloc(count + 1, > + sizeof(struct net_flow_action_arg), > + GFP_KERNEL); kcalloc failure? > + count = 0; > + > + nla_for_each_nested(args, act[NET_FLOW_ACTION_ATTR_SIGNATURE], rem) { > + if (nla_type(args) != NET_FLOW_ACTION_ARG) > + continue; > + > + if (nla_len(args) < sizeof(struct net_flow_action_arg)) { > + kfree(a->args); > + return -EINVAL; > + } > + > + a->args[count] = *(struct net_flow_action_arg *)nla_data(args); count++? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists