lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8736kzyk53.fsf@toke.dk>
Date:   Mon, 27 May 2019 17:47:20 +0200
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     Kevin 'ldir' Darbyshire-Bryant <ldir@...byshire-bryant.me.uk>
Cc:     Kevin 'ldir' Darbyshire-Bryant <ldir@...byshire-bryant.me.uk>,
        "netdev\@vger.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH net-next v5] net: sched: Introduce act_ctinfo action

Kevin 'ldir' Darbyshire-Bryant <ldir@...byshire-bryant.me.uk> writes:

> ctinfo is a new tc filter action module.  It is designed to restore
> information contained in firewall conntrack marks to other packet fields
> and is typically used on packet ingress paths.  At present it has two
> independent sub-functions or operating modes, DSCP restoration mode &
> skb mark restoration mode.
>
> The DSCP restore mode:
>
> This mode copies DSCP values that have been placed in the firewall
> conntrack mark back into the IPv4/v6 diffserv fields of relevant
> packets.
>
> The DSCP restoration is intended for use and has been found useful for
> restoring ingress classifications based on egress classifications across
> links that bleach or otherwise change DSCP, typically home ISP Internet
> links.  Restoring DSCP on ingress on the WAN link allows qdiscs such as
> but by no means limited to CAKE to shape inbound packets according to
> policies that are easier to set & mark on egress.
>
> Ingress classification is traditionally a challenging task since
> iptables rules haven't yet run and tc filter/eBPF programs are pre-NAT
> lookups, hence are unable to see internal IPv4 addresses as used on the
> typical home masquerading gateway.  Thus marking the connection in some
> manner on egress for later restoration of classification on ingress is
> easier to implement.
>
> Parameters related to DSCP restore mode:
>
> dscpmask - a 32 bit mask of 6 contiguous bits and indicate bits of the
> conntrack mark field contain the DSCP value to be restored.
>
> statemask - a 32 bit mask of (usually) 1 bit length, outside the area
> specified by dscpmask.  This represents a conditional operation flag
> whereby the DSCP is only restored if the flag is set.  This is useful to
> implement a 'one shot' iptables based classification where the
> 'complicated' iptables rules are only run once to classify the
> connection on initial (egress) packet and subsequent packets are all
> marked/restored with the same DSCP.  A mask of zero disables the
> conditional behaviour ie. the conntrack mark DSCP bits are always
> restored to the ip diffserv field (assuming the conntrack entry is found
> & the skb is an ipv4/ipv6 type)
>
> e.g. dscpmask 0xfc000000 statemask 0x01000000
>
> |----0xFC----conntrack mark----000000---|
> | Bits 31-26 | bit 25 | bit24 |~~~ Bit 0|
> | DSCP       | unused | flag  |unused   |
> |-----------------------0x01---000000---|
>       |                   |
>       |                   |
>       ---|             Conditional flag
>          v             only restore if set
> |-ip diffserv-|
> | 6 bits      |
> |-------------|
>
> The skb mark restore mode (cpmark):
>
> This mode copies the firewall conntrack mark to the skb's mark field.
> It is completely the functional equivalent of the existing act_connmark
> action with the additional feature of being able to apply a mask to the
> restored value.
>
> Parameters related to skb mark restore mode:
>
> mask - a 32 bit mask applied to the firewall conntrack mark to mask out
> bits unwanted for restoration.  This can be useful where the conntrack
> mark is being used for different purposes by different applications.  If
> not specified and by default the whole mark field is copied (i.e.
> default mask of 0xffffffff)
>
> e.g. mask 0x00ffffff to mask out the top 8 bits being used by the
> aforementioned DSCP restore mode.
>
> |----0x00----conntrack mark----ffffff---|
> | Bits 31-24 |                          |
> | DSCP & flag|      some value here     |
> |---------------------------------------|
> 			|
> 			|
> 			v
> |------------skb mark-------------------|
> |            |                          |
> |  zeroed    |                          |
> |---------------------------------------|
>
> Overall parameters:
>
> zone - conntrack zone
>
> control - action related control (reclassify | pipe | drop | continue |
> ok | goto chain <CHAIN_INDEX>)
>
> Mode specific values are passed as parameter structures across netlink.
> Similarly statistics indicating DSCP & skb mark restoration counts are
> also returned via netlink.
>
> Signed-off-by: Kevin Darbyshire-Bryant <ldir@...byshire-bryant.me.uk>

I like the new commit message! :)

> ---
> v2 - add equivalent connmark functionality with an enhancement
>     to accept a mask
>     pass statistics for each sub-function as individual netlink
>     attributes and stop (ab)using overlimits, drops
>     update the testing config correctly
> v3 - fix a licensing silly & tidy up GPL boilerplate
> v4 - drop stray copy paste inline
>     reverse christmas tree local vars
> v5 - rebase on net-next/master not net/master by mistake - doh! now
>      applies!

Still getting errors from 'git am' for your email. It helps to add
--ignore-whitespace, but then I get ^M line endings in the resulting
file...

>      rename connmark to cpmark.
>      always use structures across netlink to pass parameters.
>      rework commit message to clarify modes & applicable parameters
>      without getting bogged down in userspace syntax.
>      restrict dscpmask parameter to 6 contiguous bits only instead
>      of >=6 contiguous bits.
>      re-order netlink TLV values into functional groupings.
>
>  include/net/tc_act/tc_ctinfo.h            |  28 ++
>  include/uapi/linux/pkt_cls.h              |   1 +
>  include/uapi/linux/tc_act/tc_ctinfo.h     |  52 +++
>  net/sched/Kconfig                         |  17 +
>  net/sched/Makefile                        |   1 +
>  net/sched/act_ctinfo.c                    | 415 ++++++++++++++++++++++
>  tools/testing/selftests/tc-testing/config |   1 +
>  7 files changed, 515 insertions(+)
>  create mode 100644 include/net/tc_act/tc_ctinfo.h
>  create mode 100644 include/uapi/linux/tc_act/tc_ctinfo.h
>  create mode 100644 net/sched/act_ctinfo.c
>
> diff --git a/include/net/tc_act/tc_ctinfo.h b/include/net/tc_act/tc_ctinfo.h
> new file mode 100644
> index 000000000000..d6a688571672
> --- /dev/null
> +++ b/include/net/tc_act/tc_ctinfo.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __NET_TC_CTINFO_H
> +#define __NET_TC_CTINFO_H
> +
> +#include <net/act_api.h>
> +
> +struct tcf_ctinfo_params {
> +	struct rcu_head rcu;
> +	struct net *net;
> +	u32 dscpmask;
> +	u32 dscpstatemask;
> +	u32 cpmarkmask;
> +	u16 zone;
> +	u8 mode;
> +	u8 dscpmaskshift;
> +};
> +
> +struct tcf_ctinfo {
> +	struct tc_action common;
> +	struct tcf_ctinfo_params __rcu *params;
> +	u64 stats_dscp_set;
> +	u64 stats_dscp_error;
> +	u64 stats_cpmark_set;
> +};
> +
> +#define to_ctinfo(a) ((struct tcf_ctinfo *)a)
> +
> +#endif /* __NET_TC_CTINFO_H */
> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index 51a0496f78ea..a93680fc4bfa 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -105,6 +105,7 @@ enum tca_id {
>  	TCA_ID_IFE = TCA_ACT_IFE,
>  	TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
>  	/* other actions go here */
> +	TCA_ID_CTINFO,
>  	__TCA_ID_MAX = 255
>  };
>  
> diff --git a/include/uapi/linux/tc_act/tc_ctinfo.h b/include/uapi/linux/tc_act/tc_ctinfo.h
> new file mode 100644
> index 000000000000..48c40f657575
> --- /dev/null
> +++ b/include/uapi/linux/tc_act/tc_ctinfo.h
> @@ -0,0 +1,52 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef __UAPI_TC_CTINFO_H
> +#define __UAPI_TC_CTINFO_H
> +
> +#include <linux/types.h>
> +#include <linux/pkt_cls.h>
> +
> +struct tc_ctinfo {
> +	tc_gen;
> +};
> +
> +struct tc_ctinfo_dscp {
> +	__u32 mask;
> +	__u32 statemask;
> +};
> +
> +struct tc_ctinfo_cpmark {
> +	__u32 mask;
> +};
> +
> +struct tc_ctinfo_stats_dscp {
> +	__u64 set;
> +	__u64 error;
> +};
> +
> +struct tc_ctinfo_stats_cpmark {
> +	__u64 set;
> +};

Ugh, no, single-entry structs are not an improvement, sorry.

I get that you want to have descriptive names, but you already get that
from the netlink type names. Besides, 'mask' and 'statemask' are not
really dependent anyway; 'statemask' could reasonably be unset with no
effect on the mask value, no?

> +enum {
> +	TCA_CTINFO_UNSPEC,
> +	TCA_CTINFO_PAD,
> +	TCA_CTINFO_TM,
> +	TCA_CTINFO_ACT,
> +	TCA_CTINFO_ZONE,
> +	TCA_CTINFO_PARMS_DSCP,
> +	TCA_CTINFO_PARMS_CPMARK,
> +	TCA_CTINFO_MODE_DSCP,
> +	TCA_CTINFO_MODE_CPMARK,
> +	TCA_CTINFO_STATS_DSCP,
> +	TCA_CTINFO_STATS_CPMARK,
> +	__TCA_CTINFO_MAX
> +};
> +
> +#define TCA_CTINFO_MAX (__TCA_CTINFO_MAX - 1)
> +
> +enum {
> +	CTINFO_MODE_DSCP	= BIT(0),
> +	CTINFO_MODE_CPMARK	= BIT(1)
> +};
> +
> +#endif
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index 2c72d95c3050..d104f7ee26c7 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -877,6 +877,23 @@ config NET_ACT_CONNMARK
>  	  To compile this code as a module, choose M here: the
>  	  module will be called act_connmark.
>  
> +config NET_ACT_CTINFO
> +        tristate "Netfilter Connection Mark Actions"
> +        depends on NET_CLS_ACT && NETFILTER && IP_NF_IPTABLES
> +        depends on NF_CONNTRACK && NF_CONNTRACK_MARK
> +        help
> +	  Say Y here to allow transfer of a connmark stored information.
> +	  Current actions transfer connmark stored DSCP into
> +	  ipv4/v6 diffserv and/or to transfer connmark to packet
> +	  mark.  Both are useful for restoring egress based marks
> +	  back onto ingress connections for qdisc priority mapping
> +	  purposes.
> +
> +	  If unsure, say N.
> +
> +	  To compile this code as a module, choose M here: the
> +	  module will be called act_ctinfo.
> +
>  config NET_ACT_SKBMOD
>          tristate "skb data modification action"
>          depends on NET_CLS_ACT
> diff --git a/net/sched/Makefile b/net/sched/Makefile
> index 8a40431d7b5c..d54bfcbd7981 100644
> --- a/net/sched/Makefile
> +++ b/net/sched/Makefile
> @@ -21,6 +21,7 @@ obj-$(CONFIG_NET_ACT_CSUM)	+= act_csum.o
>  obj-$(CONFIG_NET_ACT_VLAN)	+= act_vlan.o
>  obj-$(CONFIG_NET_ACT_BPF)	+= act_bpf.o
>  obj-$(CONFIG_NET_ACT_CONNMARK)	+= act_connmark.o
> +obj-$(CONFIG_NET_ACT_CTINFO)	+= act_ctinfo.o
>  obj-$(CONFIG_NET_ACT_SKBMOD)	+= act_skbmod.o
>  obj-$(CONFIG_NET_ACT_IFE)	+= act_ife.o
>  obj-$(CONFIG_NET_IFE_SKBMARK)	+= act_meta_mark.o
> diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
> new file mode 100644
> index 000000000000..efcb7e0b5491
> --- /dev/null
> +++ b/net/sched/act_ctinfo.c
> @@ -0,0 +1,415 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/* net/sched/act_ctinfo.c  netfilter ctinfo connmark actions
> + *
> + * Copyright (c) 2019 Kevin Darbyshire-Bryant <ldir@...byshire-bryant.me.uk>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/kernel.h>
> +#include <linux/skbuff.h>
> +#include <linux/rtnetlink.h>
> +#include <linux/pkt_cls.h>
> +#include <linux/ip.h>
> +#include <linux/ipv6.h>
> +#include <net/netlink.h>
> +#include <net/pkt_sched.h>
> +#include <net/act_api.h>
> +#include <net/pkt_cls.h>
> +#include <uapi/linux/tc_act/tc_ctinfo.h>
> +#include <net/tc_act/tc_ctinfo.h>
> +
> +#include <net/netfilter/nf_conntrack.h>
> +#include <net/netfilter/nf_conntrack_core.h>
> +#include <net/netfilter/nf_conntrack_ecache.h>
> +#include <net/netfilter/nf_conntrack_zones.h>
> +
> +static struct tc_action_ops act_ctinfo_ops;
> +static unsigned int ctinfo_net_id;
> +
> +static void tcf_ctinfo_dscp_set(struct nf_conn *ct, struct tcf_ctinfo *ca,
> +				struct tcf_ctinfo_params *cp,
> +				struct sk_buff *skb, int wlen, int proto)
> +{
> +	u8 dscp, newdscp;
> +
> +	newdscp = (((ct->mark & cp->dscpmask) >> cp->dscpmaskshift) << 2) &
> +		     ~INET_ECN_MASK;
> +
> +	switch (proto) {
> +	case NFPROTO_IPV4:
> +		dscp = ipv4_get_dsfield(ip_hdr(skb)) & ~INET_ECN_MASK;
> +		if (dscp != newdscp) {
> +			if (likely(!skb_try_make_writable(skb, wlen))) {
> +				ipv4_change_dsfield(ip_hdr(skb),
> +						    INET_ECN_MASK,
> +						    newdscp);
> +				ca->stats_dscp_set++;
> +			} else {
> +				ca->stats_dscp_error++;
> +			}
> +		}
> +		break;
> +	case NFPROTO_IPV6:
> +		dscp = ipv6_get_dsfield(ipv6_hdr(skb)) & ~INET_ECN_MASK;
> +		if (dscp != newdscp) {
> +			if (likely(!skb_try_make_writable(skb, wlen))) {
> +				ipv6_change_dsfield(ipv6_hdr(skb),
> +						    INET_ECN_MASK,
> +						    newdscp);
> +				ca->stats_dscp_set++;
> +			} else {
> +				ca->stats_dscp_error++;
> +			}
> +		}
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
> +static void tcf_ctinfo_cpmark_set(struct nf_conn *ct, struct tcf_ctinfo *ca,
> +				  struct tcf_ctinfo_params *cp,
> +				  struct sk_buff *skb)
> +{
> +	ca->stats_cpmark_set++;
> +	skb->mark = ct->mark & cp->cpmarkmask;
> +}
> +
> +static int tcf_ctinfo_act(struct sk_buff *skb, const struct tc_action *a,
> +			  struct tcf_result *res)
> +{
> +	const struct nf_conntrack_tuple_hash *thash = NULL;
> +	struct tcf_ctinfo *ca = to_ctinfo(a);
> +	struct nf_conntrack_tuple tuple;
> +	struct nf_conntrack_zone zone;
> +	enum ip_conntrack_info ctinfo;
> +	struct tcf_ctinfo_params *cp;
> +	struct nf_conn *ct;
> +	int proto, wlen;
> +	int action;
> +
> +	cp = rcu_dereference_bh(ca->params);
> +
> +	tcf_lastuse_update(&ca->tcf_tm);
> +	bstats_update(&ca->tcf_bstats, skb);
> +	action = READ_ONCE(ca->tcf_action);
> +
> +	wlen = skb_network_offset(skb);
> +	if (tc_skb_protocol(skb) == htons(ETH_P_IP)) {
> +		wlen += sizeof(struct iphdr);
> +		if (!pskb_may_pull(skb, wlen))
> +			goto out;
> +
> +		proto = NFPROTO_IPV4;
> +	} else if (tc_skb_protocol(skb) == htons(ETH_P_IPV6)) {
> +		wlen += sizeof(struct ipv6hdr);
> +		if (!pskb_may_pull(skb, wlen))
> +			goto out;
> +
> +		proto = NFPROTO_IPV6;
> +	} else {
> +		goto out;
> +	}
> +
> +	ct = nf_ct_get(skb, &ctinfo);
> +	if (!ct) { /* look harder, usually ingress */
> +		if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb),
> +				       proto, cp->net, &tuple))
> +			goto out;
> +		zone.id = cp->zone;
> +		zone.dir = NF_CT_DEFAULT_ZONE_DIR;
> +
> +		thash = nf_conntrack_find_get(cp->net, &zone, &tuple);
> +		if (!thash)
> +			goto out;
> +
> +		ct = nf_ct_tuplehash_to_ctrack(thash);
> +	}
> +
> +	if (cp->mode & CTINFO_MODE_DSCP)
> +		if (!cp->dscpstatemask || (ct->mark & cp->dscpstatemask))
> +			tcf_ctinfo_dscp_set(ct, ca, cp, skb, wlen, proto);
> +
> +	if (cp->mode & CTINFO_MODE_CPMARK)
> +		tcf_ctinfo_cpmark_set(ct, ca, cp, skb);
> +
> +	if (thash)
> +		nf_ct_put(ct);
> +out:
> +	return action;
> +}
> +
> +static const struct nla_policy ctinfo_policy[TCA_CTINFO_MAX + 1] = {
> +	[TCA_CTINFO_ACT]	  = { .len = sizeof(struct tc_ctinfo) },
> +	[TCA_CTINFO_ZONE]	  = { .type = NLA_U16 },
> +	[TCA_CTINFO_PARMS_DSCP]   = { .len = sizeof(struct tc_ctinfo_dscp) },
> +	[TCA_CTINFO_PARMS_CPMARK] = { .len = sizeof(struct tc_ctinfo_cpmark) },

The opaque structs mean that you'll actually get *less* netlink validation.

> +	[TCA_CTINFO_MODE_DSCP]    = { .type = NLA_FLAG },
> +	[TCA_CTINFO_MODE_CPMARK]  = { .type = NLA_FLAG },
> +};
> +
> +static int tcf_ctinfo_init(struct net *net, struct nlattr *nla,
> +			   struct nlattr *est, struct tc_action **a,
> +			   int ovr, int bind, bool rtnl_held,
> +			   struct tcf_proto *tp,
> +			   struct netlink_ext_ack *extack)
> +{
> +	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
> +	struct tc_ctinfo_cpmark *cpmarkparm = NULL;
> +	struct nlattr *tb[TCA_CTINFO_MAX + 1];
> +	struct tcf_ctinfo_params *cp_new;
> +	struct tcf_chain *goto_ch = NULL;
> +	struct tc_ctinfo_dscp *dscpparm;
> +	struct tcf_ctinfo *ci;
> +	struct tc_ctinfo *actparm;
> +	int ret = 0, err, i;
> +
> +	if (!nla)
> +		return -EINVAL;
> +
> +	err = nla_parse_nested(tb, TCA_CTINFO_MAX, nla, ctinfo_policy, NULL);
> +	if (err < 0)
> +		return err;
> +
> +	if (!tb[TCA_CTINFO_ACT])
> +		return -EINVAL;
> +
> +	if (tb[TCA_CTINFO_MODE_DSCP] && !tb[TCA_CTINFO_PARMS_DSCP])
> +		return -EINVAL;
> +
> +	if (tb[TCA_CTINFO_PARMS_CPMARK])
> +		cpmarkparm = nla_data(tb[TCA_CTINFO_PARMS_CPMARK]);
> +
> +	dscpparm = nla_data(tb[TCA_CTINFO_PARMS_DSCP]);
> +	actparm = nla_data(tb[TCA_CTINFO_ACT]);
> +
> +	if (dscpparm) {

This is broken; you need to check if tb[TCA_CTINFO_PARMS_DSCP] is NULL
*before* calling nla_data on it. The check afterwards will not fail, as
nla_data() adds a constant value to the 0-pointer.

Maybe just move this whole validation into the

 if(tb[TCA_CTINFO_MODE_DSCP])

check?

> +		/* need contiguous 6 bit mask */
> +		i = dscpparm->mask ? __ffs(dscpparm->mask) : 0;
> +		if ((~0 & (dscpparm->mask >> i)) != 0x3f)
> +			return -EINVAL;
> +		/* mask & statemask must not overlap */
> +		if (dscpparm->mask & dscpparm->statemask)
> +			return -EINVAL;
> +	}
> +
> +	/* done the validation:now to the actual action allocation */
> +	err = tcf_idr_check_alloc(tn, &actparm->index, a, bind);
> +	if (!err) {
> +		ret = tcf_idr_create(tn, actparm->index, est, a,
> +				     &act_ctinfo_ops, bind, false);
> +		if (ret) {
> +			tcf_idr_cleanup(tn, actparm->index);
> +			return ret;
> +		}
> +	} else if (err > 0) {
> +		if (bind) /* don't override defaults */
> +			return 0;
> +		if (!ovr) {
> +			tcf_idr_release(*a, bind);
> +			return -EEXIST;
> +		}
> +	} else {
> +		return err;
> +	}
> +
> +	err = tcf_action_check_ctrlact(actparm->action, tp, &goto_ch, extack);
> +	if (err < 0)
> +		goto release_idr;
> +
> +	ci = to_ctinfo(*a);
> +
> +	cp_new = kzalloc(sizeof(*cp_new), GFP_KERNEL);
> +	if (unlikely(!cp_new)) {
> +		err = -ENOMEM;
> +		goto put_chain;
> +	}
> +
> +	cp_new->net = net;
> +	cp_new->zone = tb[TCA_CTINFO_ZONE] ?
> +			nla_get_u16(tb[TCA_CTINFO_ZONE]) : 0;
> +	if (dscpparm) {
> +		cp_new->dscpmask = dscpparm->mask;
> +		cp_new->dscpmaskshift = cp_new->dscpmask ?
> +				__ffs(cp_new->dscpmask) : 0;
> +		cp_new->dscpstatemask = dscpparm->statemask;
> +	}
> +
> +	if (cpmarkparm)
> +		cp_new->cpmarkmask = cpmarkparm->mask;
> +	else
> +		cp_new->cpmarkmask = ~0;
> +
> +	if (tb[TCA_CTINFO_MODE_DSCP])
> +		cp_new->mode |= CTINFO_MODE_DSCP;
> +	else
> +		cp_new->mode &= ~CTINFO_MODE_DSCP;
> +
> +	if (tb[TCA_CTINFO_MODE_CPMARK])
> +		cp_new->mode |= CTINFO_MODE_CPMARK;
> +	else
> +		cp_new->mode &= ~CTINFO_MODE_CPMARK;
> +
> +	spin_lock_bh(&ci->tcf_lock);
> +	goto_ch = tcf_action_set_ctrlact(*a, actparm->action, goto_ch);
> +	rcu_swap_protected(ci->params, cp_new,
> +			   lockdep_is_held(&ci->tcf_lock));
> +	spin_unlock_bh(&ci->tcf_lock);
> +
> +	if (goto_ch)
> +		tcf_chain_put_by_act(goto_ch);
> +	if (cp_new)
> +		kfree_rcu(cp_new, rcu);
> +
> +	if (ret == ACT_P_CREATED)
> +		tcf_idr_insert(tn, *a);
> +
> +	return ret;
> +
> +put_chain:
> +	if (goto_ch)
> +		tcf_chain_put_by_act(goto_ch);
> +release_idr:
> +	tcf_idr_release(*a, bind);
> +	return err;
> +}
> +
> +static int tcf_ctinfo_dump(struct sk_buff *skb, struct tc_action *a,
> +			   int bind, int ref)
> +{
> +	struct tcf_ctinfo *ci = to_ctinfo(a);
> +	struct tc_ctinfo opt = {
> +		.index   = ci->tcf_index,
> +		.refcnt  = refcount_read(&ci->tcf_refcnt) - ref,
> +		.bindcnt = atomic_read(&ci->tcf_bindcnt) - bind,
> +	};
> +	struct tc_ctinfo_stats_cpmark cpmarkstats;
> +	unsigned char *b = skb_tail_pointer(skb);
> +	struct tc_ctinfo_stats_dscp dscpstats;
> +	struct tc_ctinfo_cpmark cpmarkparm;
> +	struct tc_ctinfo_dscp dscpparm;
> +	struct tcf_ctinfo_params *cp;
> +	struct tcf_t t;
> +
> +	spin_lock_bh(&ci->tcf_lock);
> +	cp = rcu_dereference_protected(ci->params,
> +				       lockdep_is_held(&ci->tcf_lock));
> +	opt.action = ci->tcf_action;
> +
> +	if (nla_put(skb, TCA_CTINFO_ACT, sizeof(opt), &opt))
> +		goto nla_put_failure;
> +
> +	if (cp->mode & CTINFO_MODE_DSCP) {
> +		dscpparm.mask = cp->dscpmask;
> +		dscpparm.statemask = cp->dscpstatemask;
> +		if (nla_put(skb, TCA_CTINFO_PARMS_DSCP, sizeof(dscpparm),
> +			    &dscpparm))
> +			goto nla_put_failure;
> +
> +		if (nla_put_flag(skb, TCA_CTINFO_MODE_DSCP))
> +			goto nla_put_failure;
> +
> +		dscpstats.error = ci->stats_dscp_error;
> +		dscpstats.set = ci->stats_dscp_set;
> +		if (nla_put(skb, TCA_CTINFO_STATS_DSCP, sizeof(dscpstats),
> +			    &dscpstats))
> +			goto nla_put_failure;
> +	}
> +
> +	if (cp->mode & CTINFO_MODE_CPMARK) {
> +		cpmarkparm.mask = cp->cpmarkmask;
> +		if (nla_put(skb, TCA_CTINFO_PARMS_CPMARK, sizeof(cpmarkparm),
> +			    &cpmarkparm))
> +			goto nla_put_failure;
> +
> +		if (nla_put_flag(skb, TCA_CTINFO_MODE_CPMARK))
> +			goto nla_put_failure;
> +
> +		cpmarkstats.set = ci->stats_cpmark_set;
> +		if (nla_put_u64_64bit(skb, TCA_CTINFO_STATS_CPMARK,
> +				      ci->stats_cpmark_set, TCA_CTINFO_PAD))
> +			goto nla_put_failure;
> +	}

I think you should dump all the statistics no matter the configuration.
The configuration might be changed at run-time, in which case the user
will probably still want to see the old stats from before the change.

> +	if (cp->zone) {
> +		if (nla_put_u16(skb, TCA_CTINFO_ZONE, cp->zone))
> +			goto nla_put_failure;
> +	}

The default zone is '0', so there will always be a zone set. I.e., don't
leave out the attribute entirely here, just add it with a 0-value.

> +
> +	tcf_tm_dump(&t, &ci->tcf_tm);
> +	if (nla_put_64bit(skb, TCA_CTINFO_TM, sizeof(t), &t, TCA_CTINFO_PAD))
> +		goto nla_put_failure;
> +
> +	spin_unlock_bh(&ci->tcf_lock);
> +	return skb->len;
> +
> +nla_put_failure:
> +	spin_unlock_bh(&ci->tcf_lock);
> +	nlmsg_trim(skb, b);
> +	return -1;
> +}
> +
> +static int tcf_ctinfo_walker(struct net *net, struct sk_buff *skb,
> +			     struct netlink_callback *cb, int type,
> +			     const struct tc_action_ops *ops,
> +			     struct netlink_ext_ack *extack)
> +{
> +	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
> +
> +	return tcf_generic_walker(tn, skb, cb, type, ops, extack);
> +}
> +
> +static int tcf_ctinfo_search(struct net *net, struct tc_action **a, u32 index)
> +{
> +	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
> +
> +	return tcf_idr_search(tn, a, index);
> +}
> +
> +static struct tc_action_ops act_ctinfo_ops = {
> +	.kind	= "ctinfo",
> +	.id	= TCA_ID_CTINFO,
> +	.owner	= THIS_MODULE,
> +	.act	= tcf_ctinfo_act,
> +	.dump	= tcf_ctinfo_dump,
> +	.init	= tcf_ctinfo_init,
> +	.walk	= tcf_ctinfo_walker,
> +	.lookup	= tcf_ctinfo_search,
> +	.size	= sizeof(struct tcf_ctinfo),
> +};
> +
> +static __net_init int ctinfo_init_net(struct net *net)
> +{
> +	struct tc_action_net *tn = net_generic(net, ctinfo_net_id);
> +
> +	return tc_action_net_init(tn, &act_ctinfo_ops);
> +}
> +
> +static void __net_exit ctinfo_exit_net(struct list_head *net_list)
> +{
> +	tc_action_net_exit(net_list, ctinfo_net_id);
> +}
> +
> +static struct pernet_operations ctinfo_net_ops = {
> +	.init		= ctinfo_init_net,
> +	.exit_batch	= ctinfo_exit_net,
> +	.id		= &ctinfo_net_id,
> +	.size		= sizeof(struct tc_action_net),
> +};
> +
> +static int __init ctinfo_init_module(void)
> +{
> +	return tcf_register_action(&act_ctinfo_ops, &ctinfo_net_ops);
> +}
> +
> +static void __exit ctinfo_cleanup_module(void)
> +{
> +	tcf_unregister_action(&act_ctinfo_ops, &ctinfo_net_ops);
> +}
> +
> +module_init(ctinfo_init_module);
> +module_exit(ctinfo_cleanup_module);
> +MODULE_AUTHOR("Kevin Darbyshire-Bryant <ldir@...byshire-bryant.me.uk>");
> +MODULE_DESCRIPTION("Connection tracking mark actions");
> +MODULE_LICENSE("GPL");
> diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config
> index 203302065458..b235efd55367 100644
> --- a/tools/testing/selftests/tc-testing/config
> +++ b/tools/testing/selftests/tc-testing/config
> @@ -38,6 +38,7 @@ CONFIG_NET_ACT_CSUM=m
>  CONFIG_NET_ACT_VLAN=m
>  CONFIG_NET_ACT_BPF=m
>  CONFIG_NET_ACT_CONNMARK=m
> +CONFIG_NET_ACT_CTINFO=m
>  CONFIG_NET_ACT_SKBMOD=m
>  CONFIG_NET_ACT_IFE=m
>  CONFIG_NET_ACT_TUNNEL_KEY=m
> -- 
> 2.20.1 (Apple Git-117)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ