[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALx6S36dpf1-ngPwWQFUwrF-_xiBkh0Ny7x-pZTmZC=J1Vw5=w@mail.gmail.com>
Date: Fri, 26 Jun 2020 07:53:52 -0700
From: Tom Herbert <tom@...bertland.com>
To: Justin Iurman <justin.iurman@...ege.be>
Cc: Linux Kernel Network Developers <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>
Subject: Re: [PATCH net-next 3/5] ipv6: ioam: Data plane support for
Pre-allocated Trace
On Fri, Jun 26, 2020 at 1:13 AM Justin Iurman <justin.iurman@...ege.be> wrote:
>
> Tom,
>
> >> >> Implement support for processing the IOAM Pre-allocated Trace with IPv6,
> >> >> see [1] and [2]. Introduce a new IPv6 Hop-by-Hop TLV option
> >> >> IPV6_TLV_IOAM_HOPOPTS, see IANA [3].
> >> >>
> >> >
> >> > The IANA allocation is TEMPORARY, with an expiration date is
> >> > 4/16/2021. Note from RFC7120:
> >> >
> >> > "Implementers and deployers need to be aware that deprecation and
> >> > de-allocation could take place at any time after expiry; therefore, an
> >> > expired early allocation is best considered as deprecated."
> >> >
> >> > Please add a comment in the code and in the Documentation to this effect.
> >>
> >> I'll do that, thanks. What kind of comment (is there an official pattern?) and,
> >> where in the Documentation should I add it?
> >>
> >> >> A per-interface sysctl ioam6_enabled is provided to accept/drop IOAM
> >> >> packets. Default is drop.
> >> >
> >> > I'm not sure what "IOAM packets" are. Presumably, this means an IPv6
> >> > packet containing the IOAM HBH option . Note that the act bits of the
> >>
> >> Correct, the term IOAM packets is indeed a shortcut I used for IPv6 packets
> >> containing the IOAM HBH option.
> >>
> >> > option type are 00 which means the TLV is skipped if the option isn't
> >> > processed soI don't think it's correct to drop these packets by
> >> > default.
> >>
> >> Mmmh, I'd tend to disagree here. Despite the fact that the act bits are 00 for
> >> this option, I do believe it should be disabled (dropped) by default for nodes
> >> that "speak IOAM". Indeed, you don't want anyone with a kernel that includes
> >> IOAM to accept IOAM packets by default, which would mean that anyone would
> >> create (potentially without being aware) an IOAM domain. And, also, to avoid
> >> spreading leaks.
> >>
> > I think you're convoluting whether a node processes an IOAM or whether
> > it needs to drop because it doesn't process. Yes, on a IOAM system it
> > makes sense to allow configuration at whether to process the TLV.
> > However, even when it doesn't then the TLV should be skipped and the
> > packet not dropped. We know this is the correct behavior since on a
> > system that isn't IOAM aware, i.e. all deployed nodes right now, they
> > will skip the TLV per the act bits. If we want to change the default
> > behavior, the only way to do that is to change the act bits to
> > non-zero.
>
> Makes sense, you're right indeed. But still, I'm a bit worried to enable it by default. That would open the door to things we don't want. We'd end up in a situation where IOAM is not "privately" deployed. And, think about the guy that runs a kernel with IOAM (that he does not know anything about). Of course, he would not have a FW to drop IOAM. Therefore, someone could simply "create" an IOAM domain with him by sending IPv6 packets with IOAM HBH and steel data. This is something similar to the leak problem.
>
Indeed, draft-ioametal-ippm-6man-ioam-ipv6-options-02 states: "Unless
a particular interface is explicitly enabled (i.e. explicitly
configured) for IOAM, a router MUST drop packets which contain
extension headers carrying IOAM data-fields." I believe this
requirement contradicts the option type act bits being zero. I've
posted to IOAM list about this.
> So, I think there are 2 possibilities against the above: (i) the current one, ie drop by default or (ii) use 01 for act bits. This topic has been widely discussed in the WG and is still open, though the trend seems to be "00" with the drop-by-default compromise.
>
> > For the leakage problem, that is a firewall issue. The expectation is
> > that border devices will have rules that prevent leaking packets out
> > of their domain. This is an orthogonal mechanism that needs to be done
> > for other protocols-- SRH for instance. The filtering is simple, just
> > drop the packet when TLV matches (although I suspect most sites
> > probably just drop packets with EH at this point). This doesn't
> > require any changes to the implementation and doesn't require that
> > border devices even implement IOAM-- they just drop on pattern
> > matching.
>
> +1
Mentioned that also.
>
> Justin
>
> > Tom
> >> Justin
> >>
> >> >> Another per-interface sysctl ioam6_id is provided to define the IOAM
> >> >> (unique) identifier of the interface.
> >> >>
> >> >> A per-namespace sysctl ioam6_id is provided to define the IOAM (unique)
> >> >> identifier of the node.
> >> >>
> >> >> Two relativistic hash tables: one for IOAM namespaces, the other for
> >> >> IOAM schemas. A namespace can only have a single active schema and a
> >> >> schema can only be attached to a single namespace (1:1 relationship).
> >> >>
> >> >> [1] https://tools.ietf.org/html/draft-ietf-ippm-ioam-ipv6-options-01
> >> >> [2] https://tools.ietf.org/html/draft-ietf-ippm-ioam-data-09
> >> >> [3]
> >> >> https://www.iana.org/assignments/ipv6-parameters/ipv6-parameters.xhtml#ipv6-parameters-2
> >> >>
> >> >> Signed-off-by: Justin Iurman <justin.iurman@...ege.be>
> >> >> ---
> >> >> include/linux/ipv6.h | 2 +
> >> >> include/net/ioam6.h | 98 +++++++++++
> >> >> include/net/netns/ipv6.h | 2 +
> >> >> include/uapi/linux/in6.h | 1 +
> >> >> include/uapi/linux/ipv6.h | 2 +
> >> >> net/ipv6/Makefile | 2 +-
> >> >> net/ipv6/addrconf.c | 20 +++
> >> >> net/ipv6/af_inet6.c | 7 +
> >> >> net/ipv6/exthdrs.c | 67 ++++++++
> >> >> net/ipv6/ioam6.c | 326 +++++++++++++++++++++++++++++++++++++
> >> >> net/ipv6/sysctl_net_ipv6.c | 7 +
> >> >> 11 files changed, 533 insertions(+), 1 deletion(-)
> >> >> create mode 100644 include/net/ioam6.h
> >> >> create mode 100644 net/ipv6/ioam6.c
> >> >>
> >> >> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> >> >> index 5312a718bc7a..15732f964c6e 100644
> >> >> --- a/include/linux/ipv6.h
> >> >> +++ b/include/linux/ipv6.h
> >> >> @@ -75,6 +75,8 @@ struct ipv6_devconf {
> >> >> __s32 disable_policy;
> >> >> __s32 ndisc_tclass;
> >> >> __s32 rpl_seg_enabled;
> >> >> + __u32 ioam6_enabled;
> >> >> + __u32 ioam6_id;
> >> >>
> >> >> struct ctl_table_header *sysctl_header;
> >> >> };
> >> >> diff --git a/include/net/ioam6.h b/include/net/ioam6.h
> >> >> new file mode 100644
> >> >> index 000000000000..2a910bc99947
> >> >> --- /dev/null
> >> >> +++ b/include/net/ioam6.h
> >> >> @@ -0,0 +1,98 @@
> >> >> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> >> >> +/*
> >> >> + * IOAM IPv6 implementation
> >> >> + *
> >> >> + * Author:
> >> >> + * Justin Iurman <justin.iurman@...ege.be>
> >> >> + */
> >> >> +
> >> >> +#ifndef _NET_IOAM6_H
> >> >> +#define _NET_IOAM6_H
> >> >> +
> >> >> +#include <linux/net.h>
> >> >> +#include <linux/ipv6.h>
> >> >> +#include <linux/rhashtable-types.h>
> >> >> +
> >> >> +#define IOAM6_OPT_TRACE_PREALLOC 0
> >> >> +
> >> >> +#define IOAM6_TRACE_FLAG_OVERFLOW (1 << 3)
> >> >> +
> >> >> +#define IOAM6_TRACE_TYPE0 (1 << 31)
> >> >> +#define IOAM6_TRACE_TYPE1 (1 << 30)
> >> >> +#define IOAM6_TRACE_TYPE2 (1 << 29)
> >> >> +#define IOAM6_TRACE_TYPE3 (1 << 28)
> >> >> +#define IOAM6_TRACE_TYPE4 (1 << 27)
> >> >> +#define IOAM6_TRACE_TYPE5 (1 << 26)
> >> >> +#define IOAM6_TRACE_TYPE6 (1 << 25)
> >> >> +#define IOAM6_TRACE_TYPE7 (1 << 24)
> >> >> +#define IOAM6_TRACE_TYPE8 (1 << 23)
> >> >> +#define IOAM6_TRACE_TYPE9 (1 << 22)
> >> >> +#define IOAM6_TRACE_TYPE10 (1 << 21)
> >> >> +#define IOAM6_TRACE_TYPE11 (1 << 20)
> >> >> +#define IOAM6_TRACE_TYPE22 (1 << 9)
> >> >> +
> >> >> +#define IOAM6_EMPTY_FIELD_u16 0xffff
> >> >> +#define IOAM6_EMPTY_FIELD_u24 0x00ffffff
> >> >> +#define IOAM6_EMPTY_FIELD_u32 0xffffffff
> >> >> +#define IOAM6_EMPTY_FIELD_u56 0x00ffffffffffffff
> >> >> +#define IOAM6_EMPTY_FIELD_u64 0xffffffffffffffff
> >> >> +
> >> >> +struct ioam6_common_hdr {
> >> >> + u8 opt_type;
> >> >> + u8 opt_len;
> >> >> + u8 res;
> >> >> + u8 ioam_type;
> >> >> + __be16 namespace_id;
> >> >> +} __packed;
> >> >> +
> >> >> +struct ioam6_trace_hdr {
> >> >> + __be16 info;
> >> >> + __be32 type;
> >> >> +} __packed;
> >> >> +
> >> >> +struct ioam6_namespace {
> >> >> + struct rhash_head head;
> >> >> + struct rcu_head rcu;
> >> >> +
> >> >> + __be16 id;
> >> >> + __be64 data;
> >> >> + bool remove_tlv;
> >> >> +
> >> >> + struct ioam6_schema *schema;
> >> >> +};
> >> >> +
> >> >> +struct ioam6_schema {
> >> >> + struct rhash_head head;
> >> >> + struct rcu_head rcu;
> >> >> +
> >> >> + u32 id;
> >> >> + int len;
> >> >> + __be32 hdr;
> >> >> + u8 *data;
> >> >> +
> >> >> + struct ioam6_namespace *ns;
> >> >> +};
> >> >> +
> >> >> +struct ioam6_pernet_data {
> >> >> + struct mutex lock;
> >> >> + struct rhashtable namespaces;
> >> >> + struct rhashtable schemas;
> >> >> +};
> >> >> +
> >> >> +static inline struct ioam6_pernet_data *ioam6_pernet(struct net *net)
> >> >> +{
> >> >> +#if IS_ENABLED(CONFIG_IPV6)
> >> >> + return net->ipv6.ioam6_data;
> >> >> +#else
> >> >> + return NULL;
> >> >> +#endif
> >> >> +}
> >> >> +
> >> >> +extern struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id);
> >> >> +extern void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> >> >> + struct ioam6_namespace *ns);
> >> >> +
> >> >> +extern int ioam6_init(void);
> >> >> +extern void ioam6_exit(void);
> >> >> +
> >> >> +#endif
> >> >> diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
> >> >> index 5ec054473d81..89b27fa721f4 100644
> >> >> --- a/include/net/netns/ipv6.h
> >> >> +++ b/include/net/netns/ipv6.h
> >> >> @@ -51,6 +51,7 @@ struct netns_sysctl_ipv6 {
> >> >> int max_hbh_opts_len;
> >> >> int seg6_flowlabel;
> >> >> bool skip_notify_on_dev_down;
> >> >> + unsigned int ioam6_id;
> >> >> };
> >> >>
> >> >> struct netns_ipv6 {
> >> >> @@ -115,6 +116,7 @@ struct netns_ipv6 {
> >> >> spinlock_t lock;
> >> >> u32 seq;
> >> >> } ip6addrlbl_table;
> >> >> + struct ioam6_pernet_data *ioam6_data;
> >> >> };
> >> >>
> >> >> #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV6)
> >> >> diff --git a/include/uapi/linux/in6.h b/include/uapi/linux/in6.h
> >> >> index 9f2273a08356..1c98435220c9 100644
> >> >> --- a/include/uapi/linux/in6.h
> >> >> +++ b/include/uapi/linux/in6.h
> >> >> @@ -145,6 +145,7 @@ struct in6_flowlabel_req {
> >> >> #define IPV6_TLV_PADN 1
> >> >> #define IPV6_TLV_ROUTERALERT 5
> >> >> #define IPV6_TLV_CALIPSO 7 /* RFC 5570 */
> >> >> +#define IPV6_TLV_IOAM_HOPOPTS 49
> >> >
> >> > The IANA allocation is TEMPORARY, the expiration date is 4/16/2021.
> >> > Note from RFC7120:
> >> >
> >> > "Implementers and deployers need to be aware that deprecation and
> >> > de-allocation could take place at any time after expiry; therefore, an
> >> > expired early allocation is best considered as deprecated. It is not
> >> > IANA's responsibility to track the status of allocations, their
> >> > expirations, or when they may be re-allocated."
> >> >
> >> > The expiration date is Please add a comment here and in the
> >> > Documentation to this effect.
> >> >
> >> >> #define IPV6_TLV_JUMBO 194
> >> >> #define IPV6_TLV_HAO 201 /* home address option */
> >> >>
> >> >> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> >> >> index 13e8751bf24a..eb521b2dd885 100644
> >> >> --- a/include/uapi/linux/ipv6.h
> >> >> +++ b/include/uapi/linux/ipv6.h
> >> >> @@ -189,6 +189,8 @@ enum {
> >> >> DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
> >> >> DEVCONF_NDISC_TCLASS,
> >> >> DEVCONF_RPL_SEG_ENABLED,
> >> >> + DEVCONF_IOAM6_ENABLED,
> >> >> + DEVCONF_IOAM6_ID,
> >> >> DEVCONF_MAX
> >> >> };
> >> >>
> >> >> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
> >> >> index cf7b47bdb9b3..b7ef10d417d6 100644
> >> >> --- a/net/ipv6/Makefile
> >> >> +++ b/net/ipv6/Makefile
> >> >> @@ -10,7 +10,7 @@ ipv6-objs := af_inet6.o anycast.o ip6_output.o ip6_input.o
> >> >> addrconf.o \
> >> >> route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
> >> >> raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
> >> >> exthdrs.o datagram.o ip6_flowlabel.o inet6_connection_sock.o \
> >> >> - udp_offload.o seg6.o fib6_notifier.o rpl.o
> >> >> + udp_offload.o seg6.o fib6_notifier.o rpl.o ioam6.o
> >> >>
> >> >> ipv6-offload := ip6_offload.o tcpv6_offload.o exthdrs_offload.o
> >> >>
> >> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> >> >> index 840bfdb3d7bd..6c952a28ade2 100644
> >> >> --- a/net/ipv6/addrconf.c
> >> >> +++ b/net/ipv6/addrconf.c
> >> >> @@ -236,6 +236,8 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
> >> >> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
> >> >> .disable_policy = 0,
> >> >> .rpl_seg_enabled = 0,
> >> >> + .ioam6_enabled = 0,
> >> >> + .ioam6_id = 0,
> >> >> };
> >> >>
> >> >> static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = {
> >> >> @@ -291,6 +293,8 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly =
> >> >> {
> >> >> .addr_gen_mode = IN6_ADDR_GEN_MODE_EUI64,
> >> >> .disable_policy = 0,
> >> >> .rpl_seg_enabled = 0,
> >> >> + .ioam6_enabled = 0,
> >> >> + .ioam6_id = 0,
> >> >> };
> >> >>
> >> >> /* Check if link is ready: is it up and is a valid qdisc available */
> >> >> @@ -5487,6 +5491,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf
> >> >> *cnf,
> >> >> array[DEVCONF_DISABLE_POLICY] = cnf->disable_policy;
> >> >> array[DEVCONF_NDISC_TCLASS] = cnf->ndisc_tclass;
> >> >> array[DEVCONF_RPL_SEG_ENABLED] = cnf->rpl_seg_enabled;
> >> >> + array[DEVCONF_IOAM6_ENABLED] = cnf->ioam6_enabled;
> >> >> + array[DEVCONF_IOAM6_ID] = cnf->ioam6_id;
> >> >> }
> >> >>
> >> >> static inline size_t inet6_ifla6_size(void)
> >> >> @@ -6867,6 +6873,20 @@ static const struct ctl_table addrconf_sysctl[] = {
> >> >> .mode = 0644,
> >> >> .proc_handler = proc_dointvec,
> >> >> },
> >> >> + {
> >> >> + .procname = "ioam6_enabled",
> >> >> + .data = &ipv6_devconf.ioam6_enabled,
> >> >> + .maxlen = sizeof(int),
> >> >> + .mode = 0644,
> >> >> + .proc_handler = proc_dointvec,
> >> >> + },
> >> >> + {
> >> >> + .procname = "ioam6_id",
> >> >> + .data = &ipv6_devconf.ioam6_id,
> >> >> + .maxlen = sizeof(int),
> >> >> + .mode = 0644,
> >> >> + .proc_handler = proc_dointvec,
> >> >> + },
> >> >> {
> >> >> /* sentinel */
> >> >> }
> >> >> diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
> >> >> index b304b882e031..63a9ffc4b283 100644
> >> >> --- a/net/ipv6/af_inet6.c
> >> >> +++ b/net/ipv6/af_inet6.c
> >> >> @@ -62,6 +62,7 @@
> >> >> #include <net/rpl.h>
> >> >> #include <net/compat.h>
> >> >> #include <net/xfrm.h>
> >> >> +#include <net/ioam6.h>
> >> >>
> >> >> #include <linux/uaccess.h>
> >> >> #include <linux/mroute6.h>
> >> >> @@ -1187,6 +1188,10 @@ static int __init inet6_init(void)
> >> >> if (err)
> >> >> goto rpl_fail;
> >> >>
> >> >> + err = ioam6_init();
> >> >> + if (err)
> >> >> + goto ioam6_fail;
> >> >> +
> >> >> err = igmp6_late_init();
> >> >> if (err)
> >> >> goto igmp6_late_err;
> >> >> @@ -1210,6 +1215,8 @@ static int __init inet6_init(void)
> >> >> #endif
> >> >> igmp6_late_err:
> >> >> rpl_exit();
> >> >> +ioam6_fail:
> >> >> + ioam6_exit();
> >> >> rpl_fail:
> >> >> seg6_exit();
> >> >> seg6_fail:
> >> >> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> >> >> index f27ab3bf2e0c..00aee1358f1c 100644
> >> >> --- a/net/ipv6/exthdrs.c
> >> >> +++ b/net/ipv6/exthdrs.c
> >> >> @@ -49,6 +49,8 @@
> >> >> #include <net/seg6_hmac.h>
> >> >> #endif
> >> >> #include <net/rpl.h>
> >> >> +#include <net/ioam6.h>
> >> >> +#include <net/dst_metadata.h>
> >> >>
> >> >> #include <linux/uaccess.h>
> >> >>
> >> >> @@ -1010,6 +1012,67 @@ static int ipv6_hop_ra(struct sk_buff *skb, int optoff)
> >> >> return TLV_REJECT;
> >> >> }
> >> >>
> >> >> +/* IOAM */
> >> >> +
> >> >> +static int ipv6_hop_ioam(struct sk_buff *skb, int optoff)
> >> >> +{
> >> >> + struct ioam6_common_hdr *ioamh;
> >> >> + struct ioam6_namespace *ns;
> >> >> +
> >> >> + /* Must be 4n-aligned */
> >> >> + if (optoff & 3)
> >> >> + goto drop;
> >> >> +
> >> >> + if (!skb_valid_dst(skb))
> >> >> + ip6_route_input(skb);
> >> >> +
> >> >> + /* IOAM must be enabled on ingress interface */
> >> >> + if (!__in6_dev_get(skb->dev)->cnf.ioam6_enabled)
> >> >> + goto drop;
> >> >> +
> >> >> + ioamh = (struct ioam6_common_hdr *)(skb_network_header(skb) + optoff);
> >> >> + ns = ioam6_namespace(ipv6_skb_net(skb), ioamh->namespace_id);
> >> >> +
> >> >> + /* Unknown IOAM namespace, either:
> >> >> + * - Drop it if IOAM is not enabled on egress interface (if any)
> >> >> + * - Ignore it otherwise
> >> >> + */
> >> >> + if (!ns) {
> >> >> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> >> >> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> >> + goto drop;
> >> >> +
> >> >> + goto accept;
> >> >> + }
> >> >> +
> >> >> + if (ns->remove_tlv && !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> >> + goto remove;
> >> >> +
> >> >> + /* Known IOAM namespace which must not be removed:
> >> >> + * IOAM must be enabled on egress interface
> >> >> + */
> >> >> + if (!__in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_enabled &&
> >> >> + !(skb_dst(skb)->dev->flags & IFF_LOOPBACK))
> >> >> + goto drop;
> >> >> +
> >> >> + switch (ioamh->ioam_type) {
> >> >> + case IOAM6_OPT_TRACE_PREALLOC:
> >> >> + ioam6_fill_trace_data(skb, optoff + sizeof(*ioamh), ns);
> >> >> + IP6CB(skb)->flags |= IP6SKB_IOAM;
> >> >> + break;
> >> >> + default:
> >> >> + break;
> >> >> + }
> >> >> +
> >> >> +accept:
> >> >> + return TLV_ACCEPT;
> >> >> +remove:
> >> >> + return TLV_REMOVE;
> >> >> +drop:
> >> >> + kfree_skb(skb);
> >> >> + return TLV_REJECT;
> >> >> +}
> >> >> +
> >> >> /* Jumbo payload */
> >> >>
> >> >> static int ipv6_hop_jumbo(struct sk_buff *skb, int optoff)
> >> >> @@ -1081,6 +1144,10 @@ static const struct tlvtype_proc tlvprochopopt_lst[] = {
> >> >> .type = IPV6_TLV_ROUTERALERT,
> >> >> .func = ipv6_hop_ra,
> >> >> },
> >> >> + {
> >> >> + .type = IPV6_TLV_IOAM_HOPOPTS,
> >> >> + .func = ipv6_hop_ioam,
> >> >> + },
> >> >> {
> >> >> .type = IPV6_TLV_JUMBO,
> >> >> .func = ipv6_hop_jumbo,
> >> >> diff --git a/net/ipv6/ioam6.c b/net/ipv6/ioam6.c
> >> >> new file mode 100644
> >> >> index 000000000000..406aa78eb504
> >> >> --- /dev/null
> >> >> +++ b/net/ipv6/ioam6.c
> >> >> @@ -0,0 +1,326 @@
> >> >> +// SPDX-License-Identifier: GPL-2.0-or-later
> >> >> +/*
> >> >> + * IOAM IPv6 implementation
> >> >> + *
> >> >> + * Author:
> >> >> + * Justin Iurman <justin.iurman@...ege.be>
> >> >> + */
> >> >> +
> >> >> +#include <linux/errno.h>
> >> >> +#include <linux/types.h>
> >> >> +#include <linux/kernel.h>
> >> >> +#include <linux/net.h>
> >> >> +#include <linux/rhashtable.h>
> >> >> +
> >> >> +#include <net/addrconf.h>
> >> >> +#include <net/ioam6.h>
> >> >> +
> >> >> +static inline void ioam6_ns_release(struct ioam6_namespace *ns)
> >> >> +{
> >> >> + kfree_rcu(ns, rcu);
> >> >> +}
> >> >> +
> >> >> +static inline void ioam6_sc_release(struct ioam6_schema *sc)
> >> >> +{
> >> >> + kfree_rcu(sc, rcu);
> >> >> +}
> >> >> +
> >> >> +static void ioam6_free_ns(void *ptr, void *arg)
> >> >> +{
> >> >> + struct ioam6_namespace *ns = (struct ioam6_namespace *)ptr;
> >> >> +
> >> >> + if (ns)
> >> >> + ioam6_ns_release(ns);
> >> >> +}
> >> >> +
> >> >> +static void ioam6_free_sc(void *ptr, void *arg)
> >> >> +{
> >> >> + struct ioam6_schema *sc = (struct ioam6_schema *)ptr;
> >> >> +
> >> >> + if (sc)
> >> >> + ioam6_sc_release(sc);
> >> >> +}
> >> >> +
> >> >> +static int ioam6_ns_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> >> >> +{
> >> >> + const struct ioam6_namespace *ns = obj;
> >> >> +
> >> >> + return (ns->id != *(__be16 *)arg->key);
> >> >> +}
> >> >> +
> >> >> +static int ioam6_sc_cmpfn(struct rhashtable_compare_arg *arg, const void *obj)
> >> >> +{
> >> >> + const struct ioam6_schema *sc = obj;
> >> >> +
> >> >> + return (sc->id != *(u32 *)arg->key);
> >> >> +}
> >> >> +
> >> >> +static const struct rhashtable_params rht_ns_params = {
> >> >> + .key_len = sizeof(__be16),
> >> >> + .key_offset = offsetof(struct ioam6_namespace, id),
> >> >> + .head_offset = offsetof(struct ioam6_namespace, head),
> >> >> + .automatic_shrinking = true,
> >> >> + .obj_cmpfn = ioam6_ns_cmpfn,
> >> >> +};
> >> >> +
> >> >> +static const struct rhashtable_params rht_sc_params = {
> >> >> + .key_len = sizeof(u32),
> >> >> + .key_offset = offsetof(struct ioam6_schema, id),
> >> >> + .head_offset = offsetof(struct ioam6_schema, head),
> >> >> + .automatic_shrinking = true,
> >> >> + .obj_cmpfn = ioam6_sc_cmpfn,
> >> >> +};
> >> >> +
> >> >> +struct ioam6_namespace *ioam6_namespace(struct net *net, __be16 id)
> >> >> +{
> >> >> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> >> >> +
> >> >> + return rhashtable_lookup_fast(&nsdata->namespaces, &id, rht_ns_params);
> >> >> +}
> >> >> +
> >> >> +void ioam6_fill_trace_data_node(struct sk_buff *skb, int nodeoff,
> >> >> + u32 trace_type, struct ioam6_namespace *ns)
> >> >> +{
> >> >> + u8 *data = skb_network_header(skb) + nodeoff;
> >> >> + struct __kernel_sock_timeval ts;
> >> >> + u64 raw_u64;
> >> >> + u32 raw_u32;
> >> >> + u16 raw_u16;
> >> >> + u8 byte;
> >> >> +
> >> >> + /* hop_lim and node_id */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE0) {
> >> >> + byte = ipv6_hdr(skb)->hop_limit - 1;
> >> >> + raw_u32 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> >> >> + if (!raw_u32)
> >> >> + raw_u32 = IOAM6_EMPTY_FIELD_u24;
> >> >> + else
> >> >> + raw_u32 &= IOAM6_EMPTY_FIELD_u24;
> >> >> + *(__be32 *)data = cpu_to_be32((byte << 24) | raw_u32);
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* ingress_if_id and egress_if_id */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE1) {
> >> >> + raw_u16 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> >> >> + if (!raw_u16)
> >> >> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
> >> >> + *(__be16 *)data = cpu_to_be16(raw_u16);
> >> >> + data += sizeof(__be16);
> >> >> +
> >> >> + raw_u16 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> >> >> + if (!raw_u16)
> >> >> + raw_u16 = IOAM6_EMPTY_FIELD_u16;
> >> >> + *(__be16 *)data = cpu_to_be16(raw_u16);
> >> >> + data += sizeof(__be16);
> >> >> + }
> >> >> +
> >> >> + /* timestamp seconds */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE2) {
> >> >> + if (!skb->tstamp) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> + } else {
> >> >> + skb_get_new_timestamp(skb, &ts);
> >> >> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_sec);
> >> >> + }
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* timestamp subseconds */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE3) {
> >> >> + if (!skb->tstamp) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> + } else {
> >> >> + if (!(trace_type & IOAM6_TRACE_TYPE2))
> >> >> + skb_get_new_timestamp(skb, &ts);
> >> >> + *(__be32 *)data = cpu_to_be32((u32)ts.tv_usec);
> >> >> + }
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* transit delay */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE4) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* namespace data */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE5) {
> >> >> + *(__be32 *)data = (__be32)ns->data;
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* queue depth */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE6) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* hop_lim and node_id (wide) */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE7) {
> >> >> + byte = ipv6_hdr(skb)->hop_limit - 1;
> >> >> + raw_u64 = dev_net(skb->dev)->ipv6.sysctl.ioam6_id;
> >> >> + if (!raw_u64)
> >> >> + raw_u64 = IOAM6_EMPTY_FIELD_u56;
> >> >> + else
> >> >> + raw_u64 &= IOAM6_EMPTY_FIELD_u56;
> >> >> + *(__be64 *)data = cpu_to_be64(((u64)byte << 56) | raw_u64);
> >> >> + data += sizeof(__be64);
> >> >> + }
> >> >> +
> >> >> + /* ingress_if_id and egress_if_id (wide) */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE8) {
> >> >> + raw_u32 = __in6_dev_get(skb->dev)->cnf.ioam6_id;
> >> >> + if (!raw_u32)
> >> >> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
> >> >> + *(__be32 *)data = cpu_to_be32(raw_u32);
> >> >> + data += sizeof(__be32);
> >> >> +
> >> >> + raw_u32 = __in6_dev_get(skb_dst(skb)->dev)->cnf.ioam6_id;
> >> >> + if (!raw_u32)
> >> >> + raw_u32 = IOAM6_EMPTY_FIELD_u32;
> >> >> + *(__be32 *)data = cpu_to_be32(raw_u32);
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* namespace data (wide) */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE9) {
> >> >> + *(__be64 *)data = ns->data;
> >> >> + data += sizeof(__be64);
> >> >> + }
> >> >> +
> >> >> + /* buffer occupancy */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE10) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* checksum complement */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE11) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u32);
> >> >> + data += sizeof(__be32);
> >> >> + }
> >> >> +
> >> >> + /* opaque state snapshot */
> >> >> + if (trace_type & IOAM6_TRACE_TYPE22) {
> >> >> + if (!ns->schema) {
> >> >> + *(__be32 *)data = cpu_to_be32(IOAM6_EMPTY_FIELD_u24);
> >> >> + } else {
> >> >> + *(__be32 *)data = ns->schema->hdr;
> >> >> + data += sizeof(__be32);
> >> >> + memcpy(data, ns->schema->data, ns->schema->len);
> >> >> + }
> >> >> + }
> >> >> +}
> >> >> +
> >> >> +void ioam6_fill_trace_data(struct sk_buff *skb, int traceoff,
> >> >> + struct ioam6_namespace *ns)
> >> >> +{
> >> >> + u8 nodelen, flags, remlen, sclen = 0;
> >> >> + struct ioam6_trace_hdr *trh;
> >> >> + int nodeoff;
> >> >> + u16 info;
> >> >> + u32 type;
> >> >> +
> >> >> + trh = (struct ioam6_trace_hdr *)(skb_network_header(skb) + traceoff);
> >> >> + info = be16_to_cpu(trh->info);
> >> >> + type = be32_to_cpu(trh->type);
> >> >> +
> >> >> + nodelen = info >> 11;
> >> >> + flags = (info >> 7) & 0xf;
> >> >> + remlen = info & 0x7f;
> >> >> +
> >> >> + /* Skip if Overflow bit is set OR
> >> >> + * if an unknown type (bit 12-21) is set
> >> >> + */
> >> >> + if ((flags & IOAM6_TRACE_FLAG_OVERFLOW) || (type & 0xffc00))
> >> >> + return;
> >> >> +
> >> >> + /* NodeLen does not include Opaque State Snapshot length. We need to
> >> >> + * take it into account if the corresponding bit is set and if current
> >> >> + * IOAM namespace has an active schema attached to it
> >> >> + */
> >> >> + if (type & IOAM6_TRACE_TYPE22) {
> >> >> + /* Opaque State Snapshot header size */
> >> >> + sclen = sizeof_field(struct ioam6_schema, hdr) / 4;
> >> >> +
> >> >> + if (ns->schema)
> >> >> + sclen += ns->schema->len / 4;
> >> >> + }
> >> >> +
> >> >> + /* Not enough space remaining: set Overflow bit and skip */
> >> >> + if (!remlen || remlen < (nodelen + sclen)) {
> >> >> + info |= IOAM6_TRACE_FLAG_OVERFLOW << 7;
> >> >> + trh->info = cpu_to_be16(info);
> >> >> + return;
> >> >> + }
> >> >> +
> >> >> + nodeoff = traceoff + sizeof(*trh) + remlen*4 - nodelen*4 - sclen*4;
> >> >> + ioam6_fill_trace_data_node(skb, nodeoff, type, ns);
> >> >> +
> >> >> + /* Update RemainingLen */
> >> >> + remlen -= nodelen + sclen;
> >> >> + info = (info & 0xff80) | remlen;
> >> >> + trh->info = cpu_to_be16(info);
> >> >> +}
> >> >> +
> >> >> +static int __net_init ioam6_net_init(struct net *net)
> >> >> +{
> >> >> + struct ioam6_pernet_data *nsdata;
> >> >> + int err = -ENOMEM;
> >> >> +
> >> >> + nsdata = kzalloc(sizeof(*nsdata), GFP_KERNEL);
> >> >> + if (!nsdata)
> >> >> + goto out;
> >> >> +
> >> >> + mutex_init(&nsdata->lock);
> >> >> + net->ipv6.ioam6_data = nsdata;
> >> >> +
> >> >> + err = rhashtable_init(&nsdata->namespaces, &rht_ns_params);
> >> >> + if (err)
> >> >> + goto free_nsdata;
> >> >> +
> >> >> + err = rhashtable_init(&nsdata->schemas, &rht_sc_params);
> >> >> + if (err)
> >> >> + goto free_rht_ns;
> >> >> +
> >> >> +out:
> >> >> + return err;
> >> >> +free_rht_ns:
> >> >> + rhashtable_destroy(&nsdata->namespaces);
> >> >> +free_nsdata:
> >> >> + kfree(nsdata);
> >> >> + net->ipv6.ioam6_data = NULL;
> >> >> + goto out;
> >> >> +}
> >> >> +
> >> >> +static void __net_exit ioam6_net_exit(struct net *net)
> >> >> +{
> >> >> + struct ioam6_pernet_data *nsdata = ioam6_pernet(net);
> >> >> +
> >> >> + rhashtable_free_and_destroy(&nsdata->namespaces, ioam6_free_ns, NULL);
> >> >> + rhashtable_free_and_destroy(&nsdata->schemas, ioam6_free_sc, NULL);
> >> >> +
> >> >> + kfree(nsdata);
> >> >> +}
> >> >> +
> >> >> +static struct pernet_operations ioam6_net_ops = {
> >> >> + .init = ioam6_net_init,
> >> >> + .exit = ioam6_net_exit,
> >> >> +};
> >> >> +
> >> >> +int __init ioam6_init(void)
> >> >> +{
> >> >> + int err = register_pernet_subsys(&ioam6_net_ops);
> >> >> +
> >> >> + if (err)
> >> >> + return err;
> >> >> +
> >> >> + pr_info("In-situ OAM (IOAM) with IPv6\n");
> >> >> + return 0;
> >> >> +}
> >> >> +
> >> >> +void ioam6_exit(void)
> >> >> +{
> >> >> + unregister_pernet_subsys(&ioam6_net_ops);
> >> >> +}
> >> >> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
> >> >> index fac2135aa47b..da49b33ab6fc 100644
> >> >> --- a/net/ipv6/sysctl_net_ipv6.c
> >> >> +++ b/net/ipv6/sysctl_net_ipv6.c
> >> >> @@ -159,6 +159,13 @@ static struct ctl_table ipv6_table_template[] = {
> >> >> .mode = 0644,
> >> >> .proc_handler = proc_dointvec
> >> >> },
> >> >> + {
> >> >> + .procname = "ioam6_id",
> >> >> + .data = &init_net.ipv6.sysctl.ioam6_id,
> >> >> + .maxlen = sizeof(int),
> >> >> + .mode = 0644,
> >> >> + .proc_handler = proc_dointvec
> >> >> + },
> >> >> { }
> >> >> };
> >> >>
> >> >> --
> > > >> 2.17.1
Powered by blists - more mailing lists