[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <876176x9hg.fsf@x220.int.ebiederm.org>
Date: Tue, 02 Jun 2015 13:15:23 -0500
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Robert Shearman <rshearma@...cade.com>
Cc: <netdev@...r.kernel.org>, roopa <roopa@...ulusnetworks.com>,
Thomas Graf <tgraf@...g.ch>
Subject: Re: [RFC net-next 1/3] net: infra for per-nexthop encap data
Robert Shearman <rshearma@...cade.com> writes:
> Having to add a new interface to apply encap onto a packet is a
> mechanism that works well today, allowing the setup of the encap to be
> done separately from the routes out of them, meaning that routing
> protocols and other user-space apps don't need to do anything special
> to add routes out of a new type of interface. However, the overhead of
> creating an interface is high, especially in terms of
> memory. Therefore, the traditional method won't work very well for
> large numbers of routes applying encap where there is a low degree of
> sharing of the encap.
>
> The solution is to introduce a way of defining encap on a per-nexthop
> basis (i.e. per-route if only one nexthop) through the addition of a
> new netlink attribute, RTA_ENCAP. The semantics of this attribute is
> that the data is interpreted according to the output interface type
> (RTA_OIF) and is opaque to the normal forwarding path. The output
> interface doesn't have to be defined per-nexthop, but instead
> represents the way of encapsulating the packet. There could be as few
> as one per namespace, but more could be created, particularly if they
> are used to define parameters which are shared by a large number of
> routes. However, the split of what goes in the encap data and what
> might be specified via interface attributes is entirely up to the
> encap-type implementation.
>
> New rtnetlink operations are defined to assist with the management of
> this data:
> - parse_encap for parsing the attribute given through rtnl and either
> sizing the in-memory version (if encap ptr is NULL) or filling in the
> in-memory version. RTA_ENCAP work for IPv4. This operations allows
> the interface to reject invalid encap specified by user-space and the
> sizing allows the kernel to have a different in memory implementation
> to the netlink API (which might be optimised for extensibility rather
> than speed of packet forwarding).
> - fill_encap for taking the in-memory version of the encap and filling
> in an RTA_ENCAP attribute in a netlink message.
> - match_encap for comparing an in-memory version of encap with an
> RTA_ENCAP version, returning 0 if matching or 1 if different.
>
> A new dst operation is also defined to allow encap-type interfaces to
> retrieve the encap data from their xmit functions and use it for
> encapsulating the packet and for further forwarding.
This bit of infrastructure should be more like rtnl_register. Where
we register an encap type and the operations to go with it.
Just like rtnl_register we can have small array with the operations for
each supported encapsulation.
Eric
> Suggested-by: "Eric W. Biederman" <ebiederm@...ssion.com>
> Signed-off-by: Robert Shearman <rshearma@...cade.com>
> ---
> include/linux/rtnetlink.h | 7 +++++++
> include/net/dst.h | 11 +++++++++++
> include/net/dst_ops.h | 2 ++
> include/net/rtnetlink.h | 11 +++++++++++
> include/uapi/linux/rtnetlink.h | 1 +
> net/core/rtnetlink.c | 36 ++++++++++++++++++++++++++++++++++++
> 6 files changed, 68 insertions(+)
>
> diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
> index a2324fb45cf4..470d822ddd61 100644
> --- a/include/linux/rtnetlink.h
> +++ b/include/linux/rtnetlink.h
> @@ -22,6 +22,13 @@ struct sk_buff *rtmsg_ifinfo_build_skb(int type, struct net_device *dev,
> void rtmsg_ifinfo_send(struct sk_buff *skb, struct net_device *dev,
> gfp_t flags);
>
> +int rtnl_parse_encap(const struct net_device *dev, const struct nlattr *nla,
> + void *encap);
> +int rtnl_fill_encap(const struct net_device *dev, struct sk_buff *skb,
> + int encap_len, const void *encap);
> +int rtnl_match_encap(const struct net_device *dev, const struct nlattr *nla,
> + int encap_len, const void *encap);
> +
>
> /* RTNL is used as a global lock for all changes to network configuration */
> extern void rtnl_lock(void);
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 2bc73f8a00a9..df0e6ec18eca 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -506,4 +506,15 @@ static inline struct xfrm_state *dst_xfrm(const struct dst_entry *dst)
> }
> #endif
>
> +/* Get encap data for destination */
> +static inline int dst_get_encap(struct sk_buff *skb, const void **encap)
> +{
> + const struct dst_entry *dst = skb_dst(skb);
> +
> + if (!dst || !dst->ops->get_encap)
> + return 0;
> +
> + return dst->ops->get_encap(dst, encap);
> +}
> +
> #endif /* _NET_DST_H */
> diff --git a/include/net/dst_ops.h b/include/net/dst_ops.h
> index d64253914a6a..97f48cf8ef7d 100644
> --- a/include/net/dst_ops.h
> +++ b/include/net/dst_ops.h
> @@ -32,6 +32,8 @@ struct dst_ops {
> struct neighbour * (*neigh_lookup)(const struct dst_entry *dst,
> struct sk_buff *skb,
> const void *daddr);
> + int (*get_encap)(const struct dst_entry *dst,
> + const void **encap);
>
> struct kmem_cache *kmem_cachep;
>
> diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h
> index 343d922d15c2..3121ade24957 100644
> --- a/include/net/rtnetlink.h
> +++ b/include/net/rtnetlink.h
> @@ -95,6 +95,17 @@ struct rtnl_link_ops {
> const struct net_device *dev,
> const struct net_device *slave_dev);
> struct net *(*get_link_net)(const struct net_device *dev);
> + int (*parse_encap)(const struct net_device *dev,
> + const struct nlattr *nla,
> + void *encap);
> + int (*fill_encap)(const struct net_device *dev,
> + struct sk_buff *skb,
> + int encap_len,
> + const void *encap);
> + int (*match_encap)(const struct net_device *dev,
> + const struct nlattr *nla,
> + int encap_len,
> + const void *encap);
> };
>
> int __rtnl_link_register(struct rtnl_link_ops *ops);
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index 17fb02f488da..ed4c797503f2 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -308,6 +308,7 @@ enum rtattr_type_t {
> RTA_VIA,
> RTA_NEWDST,
> RTA_PREF,
> + RTA_ENCAP,
> __RTA_MAX
> };
>
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 077b6d280371..3b4e40a82799 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1441,6 +1441,42 @@ static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[])
> return 0;
> }
>
> +int rtnl_parse_encap(const struct net_device *dev, const struct nlattr *nla,
> + void *encap)
> +{
> + const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> +
> + if (!ops->parse_encap)
> + return -EINVAL;
> +
> + return ops->parse_encap(dev, nla, encap);
> +}
> +EXPORT_SYMBOL(rtnl_parse_encap);
> +
> +int rtnl_fill_encap(const struct net_device *dev, struct sk_buff *skb,
> + int encap_len, const void *encap)
> +{
> + const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> +
> + if (!ops->fill_encap)
> + return -EINVAL;
> +
> + return ops->fill_encap(dev, skb, encap_len, encap);
> +}
> +EXPORT_SYMBOL(rtnl_fill_encap);
> +
> +int rtnl_match_encap(const struct net_device *dev, const struct nlattr *nla,
> + int encap_len, const void *encap)
> +{
> + const struct rtnl_link_ops *ops = dev->rtnl_link_ops;
> +
> + if (!ops->match_encap)
> + return -EINVAL;
> +
> + return ops->match_encap(dev, nla, encap_len, encap);
> +}
> +EXPORT_SYMBOL(rtnl_match_encap);
> +
> static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
> {
> int rem, err = -EINVAL;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists