lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5278EC8B.4060902@redhat.com>
Date:	Tue, 05 Nov 2013 14:03:07 +0100
From:	Daniel Borkmann <dborkman@...hat.com>
To:	pablo@...filter.org
CC:	netfilter-devel@...r.kernel.org, netdev@...r.kernel.org,
	Tejun Heo <tj@...nel.org>, cgroups@...r.kernel.org
Subject: Re: [PATCH nf-next] netfilter: xtables: lightweight process control
 group matching

On 10/18/2013 03:28 PM, Daniel Borkmann wrote:
> It would be useful e.g. in a server or desktop environment to have
> a facility in the notion of fine-grained "per application" or "per
> application group" firewall policies. Probably, users in the mobile/
> embedded area (e.g. Android based) with different security policy
> requirements for application groups could have great benefit from
> that as well. For example, with a little bit of configuration effort,
> an admin could whitelist well-known applications, and thus block
> otherwise unwanted "hard-to-track" applications like [1] from a
> user's machine.
>
> Implementation of PID-based matching would not be appropriate
> as they frequently change, and child tracking would make that
> even more complex and ugly. Cgroups would be a perfect candidate
> for accomplishing that as they associate a set of tasks with a
> set of parameters for one or more subsystems, in our case the
> netfilter subsystem, which, of course, can be combined with other
> cgroup subsystems into something more complex.
>
> As mentioned, to overcome this constraint, such processes could
> be placed into one or multiple cgroups where different fine-grained
> rules can be defined depending on the application scenario, while
> e.g. everything else that is not part of that could be dropped (or
> vice versa), thus making life harder for unwanted processes to
> communicate to the outside world. So, we make use of cgroups here
> to track jobs and limit their resources in terms of iptables
> policies; in other words, limiting what they are allowed to
> communicate.
>
> Minimal, basic usage example (many other iptables options can be
> applied obviously):
>
>   1) Configuring cgroups:
>
>    mkdir /sys/fs/cgroup/net_filter
>    mount -t cgroup -o net_filter net_filter /sys/fs/cgroup/net_filter
>    mkdir /sys/fs/cgroup/net_filter/0
>    echo 1 > /sys/fs/cgroup/net_filter/0/net_filter.fwid
>
>   2) Configuring netfilter:
>
>    iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP
>
>   3) Running applications:
>
>    ping 208.67.222.222  <pid:1799>
>    echo 1799 > /sys/fs/cgroup/net_filter/0/tasks
>    64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
>    ...
>
>    ping 208.67.220.220  <pid:1804>
>    ping: sendmsg: Operation not permitted
>    ...
>    echo 1804 > /sys/fs/cgroup/net_filter/0/tasks
>    64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
>    ...
>
> Of course, real-world deployments would make use of cgroups user
> space toolsuite, or own custom policy daemons dynamically moving
> applications from/to various net_filter cgroups.
>
> Design considerations appendix:
>
> Based on the discussion from [2], [3], it seems the best tradeoff
> imho to make this a subsystem, here's why:
>
> netfilter is a large enough and ubiquitous subsystem, meaning it
> is not somewhere in a niche, and enabled/shipped on most machines.
> It is true that the descision making on fwid is "outsourced" to
> netfilter itself, but that does not necessarily need to be
> considered as a bad thing to delegate and reuse as much as possible.
> The matching performance in the critical path is just a simple
> comparison of fwid tags, nothing more, thus resulting in a good
> performance suited for high-speed networking. Moreover, by simply
> transfering fwids between user- and kernel space, we can have the
> ruleset as packed as possible, giving an optimal footprint for
> large rulesets using this feature. The alternative draft that we
> have proposed in [3] comes at the cost of exposing some of the
> cgroups internals outside of cgroups to make it work, at least a
> higher memory footprint for transferal of rules and even worse a
> lower performance as more work needs to be done in the matching
> critical path, that is traversing all cgroups a task belongs to
> to find the one of our interest. Moreover, from the usability
> point of view, it seems less intuitive, rather more confusing
> than the approach presented here. Therefore, I consider this design
> the better and less intrusive tradeoff to go with.

As I've provided a code proposal for both variants and a design
discussion/conclusion, are you d'accord with this patch Tejun?

>    [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdf
>    [2] http://patchwork.ozlabs.org/patch/280687/
>    [3] http://patchwork.ozlabs.org/patch/282477/
>
> Signed-off-by: Daniel Borkmann <dborkman@...hat.com>
> Cc: Tejun Heo <tj@...nel.org>
> Cc: cgroups@...r.kernel.org
> ---
>   v1->v2:
>    - Updated commit message, rebased
>    - Applied Gao Feng's feedback from [2]
>
>   Note: iptables part is still available in http://patchwork.ozlabs.org/patch/280690/
>
>   Documentation/cgroups/00-INDEX           |   2 +
>   Documentation/cgroups/net_filter.txt     |  27 +++++
>   include/linux/cgroup_subsys.h            |   5 +
>   include/net/netfilter/xt_cgroup.h        |  58 ++++++++++
>   include/net/sock.h                       |   3 +
>   include/uapi/linux/netfilter/Kbuild      |   1 +
>   include/uapi/linux/netfilter/xt_cgroup.h |  11 ++
>   net/core/scm.c                           |   2 +
>   net/core/sock.c                          |  14 +++
>   net/netfilter/Kconfig                    |   8 ++
>   net/netfilter/Makefile                   |   1 +
>   net/netfilter/xt_cgroup.c                | 177 +++++++++++++++++++++++++++++++
>   12 files changed, 309 insertions(+)
>   create mode 100644 Documentation/cgroups/net_filter.txt
>   create mode 100644 include/net/netfilter/xt_cgroup.h
>   create mode 100644 include/uapi/linux/netfilter/xt_cgroup.h
>   create mode 100644 net/netfilter/xt_cgroup.c
>
> diff --git a/Documentation/cgroups/00-INDEX b/Documentation/cgroups/00-INDEX
> index bc461b6..14424d2 100644
> --- a/Documentation/cgroups/00-INDEX
> +++ b/Documentation/cgroups/00-INDEX
> @@ -20,6 +20,8 @@ memory.txt
>   	- Memory Resource Controller; design, accounting, interface, testing.
>   net_cls.txt
>   	- Network classifier cgroups details and usages.
> +net_filter.txt
> +	- Network firewalling (netfilter) cgroups details and usages.
>   net_prio.txt
>   	- Network priority cgroups details and usages.
>   resource_counter.txt
> diff --git a/Documentation/cgroups/net_filter.txt b/Documentation/cgroups/net_filter.txt
> new file mode 100644
> index 0000000..22759e4
> --- /dev/null
> +++ b/Documentation/cgroups/net_filter.txt
> @@ -0,0 +1,27 @@
> +Netfilter cgroup
> +----------------
> +
> +The netfilter cgroup provides an interface to aggregate jobs
> +to a particular netfilter tag, that can be used to apply
> +various iptables/netfilter policies for those jobs in order
> +to limit resources/abilities for network communication.
> +
> +Creating a net_filter cgroups instance creates a net_filter.fwid
> +file. The value of net_filter.fwid is initialized to 0 on
> +default (so only global iptables/netfilter policies apply).
> +You can write a unique decimal fwid tag into net_filter.fwid
> +file, and use that tag along with iptables' --cgroup option.
> +
> +Minimal/basic usage example:
> +
> +1) Configuring cgroup:
> +
> + mkdir /sys/fs/cgroup/net_filter
> + mount -t cgroup -o net_filter net_filter /sys/fs/cgroup/net_filter
> + mkdir /sys/fs/cgroup/net_filter/0
> + echo 1 > /sys/fs/cgroup/net_filter/0/net_filter.fwid
> + echo [pid] > /sys/fs/cgroup/net_filter/0/tasks
> +
> +2) Configuring netfilter:
> +
> + iptables -A OUTPUT -m cgroup ! --cgroup 1 -p tcp --dport 80 -j DROP
> diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
> index b613ffd..ef58217 100644
> --- a/include/linux/cgroup_subsys.h
> +++ b/include/linux/cgroup_subsys.h
> @@ -50,6 +50,11 @@ SUBSYS(net_prio)
>   #if IS_SUBSYS_ENABLED(CONFIG_CGROUP_HUGETLB)
>   SUBSYS(hugetlb)
>   #endif
> +
> +#if IS_SUBSYS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP)
> +SUBSYS(net_filter)
> +#endif
> +
>   /*
>    * DO NOT ADD ANY SUBSYSTEM WITHOUT EXPLICIT ACKS FROM CGROUP MAINTAINERS.
>    */
> diff --git a/include/net/netfilter/xt_cgroup.h b/include/net/netfilter/xt_cgroup.h
> new file mode 100644
> index 0000000..b2c702f
> --- /dev/null
> +++ b/include/net/netfilter/xt_cgroup.h
> @@ -0,0 +1,58 @@
> +#ifndef _XT_CGROUP_H
> +#define _XT_CGROUP_H
> +
> +#include <linux/types.h>
> +#include <linux/cgroup.h>
> +#include <linux/hardirq.h>
> +#include <linux/rcupdate.h>
> +
> +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP)
> +struct cgroup_nf_state {
> +	struct cgroup_subsys_state css;
> +	u32 fwid;
> +};
> +
> +void sock_update_fwid(struct sock *sk);
> +
> +#if IS_BUILTIN(CONFIG_NETFILTER_XT_MATCH_CGROUP)
> +static inline u32 task_fwid(struct task_struct *p)
> +{
> +	u32 fwid;
> +
> +	if (in_interrupt())
> +		return 0;
> +
> +	rcu_read_lock();
> +	fwid = container_of(task_css(p, net_filter_subsys_id),
> +			    struct cgroup_nf_state, css)->fwid;
> +	rcu_read_unlock();
> +
> +	return fwid;
> +}
> +#elif IS_MODULE(CONFIG_NETFILTER_XT_MATCH_CGROUP)
> +static inline u32 task_fwid(struct task_struct *p)
> +{
> +	struct cgroup_subsys_state *css;
> +	u32 fwid = 0;
> +
> +	if (in_interrupt())
> +		return 0;
> +
> +	rcu_read_lock();
> +	css = task_css(p, net_filter_subsys_id);
> +	if (css)
> +		fwid = container_of(css, struct cgroup_nf_state, css)->fwid;
> +	rcu_read_unlock();
> +
> +	return fwid;
> +}
> +#endif
> +#else /* !CONFIG_NETFILTER_XT_MATCH_CGROUP */
> +static inline u32 task_fwid(struct task_struct *p)
> +{
> +	return 0;
> +}
> +
> +#define sock_update_fwid(sk)
> +#endif /* CONFIG_NETFILTER_XT_MATCH_CGROUP */
> +#endif /* _XT_CGROUP_H */
> diff --git a/include/net/sock.h b/include/net/sock.h
> index e3bf213..f7da4b4 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -387,6 +387,9 @@ struct sock {
>   #if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
>   	__u32			sk_cgrp_prioidx;
>   #endif
> +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP)
> +	__u32			sk_cgrp_fwid;
> +#endif
>   	struct pid		*sk_peer_pid;
>   	const struct cred	*sk_peer_cred;
>   	long			sk_rcvtimeo;
> diff --git a/include/uapi/linux/netfilter/Kbuild b/include/uapi/linux/netfilter/Kbuild
> index 1749154..94a4890 100644
> --- a/include/uapi/linux/netfilter/Kbuild
> +++ b/include/uapi/linux/netfilter/Kbuild
> @@ -37,6 +37,7 @@ header-y += xt_TEE.h
>   header-y += xt_TPROXY.h
>   header-y += xt_addrtype.h
>   header-y += xt_bpf.h
> +header-y += xt_cgroup.h
>   header-y += xt_cluster.h
>   header-y += xt_comment.h
>   header-y += xt_connbytes.h
> diff --git a/include/uapi/linux/netfilter/xt_cgroup.h b/include/uapi/linux/netfilter/xt_cgroup.h
> new file mode 100644
> index 0000000..43acb7e
> --- /dev/null
> +++ b/include/uapi/linux/netfilter/xt_cgroup.h
> @@ -0,0 +1,11 @@
> +#ifndef _UAPI_XT_CGROUP_H
> +#define _UAPI_XT_CGROUP_H
> +
> +#include <linux/types.h>
> +
> +struct xt_cgroup_info {
> +	__u32 id;
> +	__u32 invert;
> +};
> +
> +#endif /* _UAPI_XT_CGROUP_H */
> diff --git a/net/core/scm.c b/net/core/scm.c
> index b442e7e..f08672a 100644
> --- a/net/core/scm.c
> +++ b/net/core/scm.c
> @@ -36,6 +36,7 @@
>   #include <net/sock.h>
>   #include <net/compat.h>
>   #include <net/scm.h>
> +#include <net/netfilter/xt_cgroup.h>
>   #include <net/cls_cgroup.h>
>
>
> @@ -290,6 +291,7 @@ void scm_detach_fds(struct msghdr *msg, struct scm_cookie *scm)
>   		/* Bump the usage count and install the file. */
>   		sock = sock_from_file(fp[i], &err);
>   		if (sock) {
> +			sock_update_fwid(sock->sk);
>   			sock_update_netprioidx(sock->sk);
>   			sock_update_classid(sock->sk);
>   		}
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 2bd9b3f..524a376 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -125,6 +125,7 @@
>   #include <linux/skbuff.h>
>   #include <net/net_namespace.h>
>   #include <net/request_sock.h>
> +#include <net/netfilter/xt_cgroup.h>
>   #include <net/sock.h>
>   #include <linux/net_tstamp.h>
>   #include <net/xfrm.h>
> @@ -1337,6 +1338,18 @@ void sock_update_netprioidx(struct sock *sk)
>   EXPORT_SYMBOL_GPL(sock_update_netprioidx);
>   #endif
>
> +#if IS_ENABLED(CONFIG_NETFILTER_XT_MATCH_CGROUP)
> +void sock_update_fwid(struct sock *sk)
> +{
> +	u32 fwid;
> +
> +	fwid = task_fwid(current);
> +	if (fwid != sk->sk_cgrp_fwid)
> +		sk->sk_cgrp_fwid = fwid;
> +}
> +EXPORT_SYMBOL(sock_update_fwid);
> +#endif
> +
>   /**
>    *	sk_alloc - All socket objects are allocated here
>    *	@net: the applicable net namespace
> @@ -1363,6 +1376,7 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
>
>   		sock_update_classid(sk);
>   		sock_update_netprioidx(sk);
> +		sock_update_fwid(sk);
>   	}
>
>   	return sk;
> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
> index 6e839b6..d276ff4 100644
> --- a/net/netfilter/Kconfig
> +++ b/net/netfilter/Kconfig
> @@ -806,6 +806,14 @@ config NETFILTER_XT_MATCH_BPF
>
>   	  To compile it as a module, choose M here.  If unsure, say N.
>
> +config NETFILTER_XT_MATCH_CGROUP
> +	tristate '"control group" match support'
> +	depends on NETFILTER_ADVANCED
> +	depends on CGROUPS
> +	---help---
> +	Socket/process control group matching allows you to match locally
> +	generated packets based on which control group processes belong to.
> +
>   config NETFILTER_XT_MATCH_CLUSTER
>   	tristate '"cluster" match support'
>   	depends on NF_CONNTRACK
> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
> index c3a0a12..12f014f 100644
> --- a/net/netfilter/Makefile
> +++ b/net/netfilter/Makefile
> @@ -124,6 +124,7 @@ obj-$(CONFIG_NETFILTER_XT_MATCH_MULTIPORT) += xt_multiport.o
>   obj-$(CONFIG_NETFILTER_XT_MATCH_NFACCT) += xt_nfacct.o
>   obj-$(CONFIG_NETFILTER_XT_MATCH_OSF) += xt_osf.o
>   obj-$(CONFIG_NETFILTER_XT_MATCH_OWNER) += xt_owner.o
> +obj-$(CONFIG_NETFILTER_XT_MATCH_CGROUP) += xt_cgroup.o
>   obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o
>   obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
>   obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o
> diff --git a/net/netfilter/xt_cgroup.c b/net/netfilter/xt_cgroup.c
> new file mode 100644
> index 0000000..249c7ee
> --- /dev/null
> +++ b/net/netfilter/xt_cgroup.c
> @@ -0,0 +1,177 @@
> +/*
> + * Xtables module to match the process control group.
> + *
> + * Might be used to implement individual "per-application" firewall
> + * policies in contrast to global policies based on control groups.
> + *
> + * (C) 2013 Daniel Borkmann <dborkman@...hat.com>
> + * (C) 2013 Thomas Graf <tgraf@...hat.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#include <linux/skbuff.h>
> +#include <linux/module.h>
> +#include <linux/file.h>
> +#include <linux/cgroup.h>
> +#include <linux/fdtable.h>
> +#include <linux/netfilter/x_tables.h>
> +#include <linux/netfilter/xt_cgroup.h>
> +#include <net/netfilter/xt_cgroup.h>
> +#include <net/sock.h>
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("Daniel Borkmann <dborkman@...hat.com>");
> +MODULE_DESCRIPTION("Xtables: process control group matching");
> +MODULE_ALIAS("ipt_cgroup");
> +MODULE_ALIAS("ip6t_cgroup");
> +
> +static int cgroup_mt_check(const struct xt_mtchk_param *par)
> +{
> +	struct xt_cgroup_info *info = par->matchinfo;
> +
> +	if (info->invert & ~1)
> +		return -EINVAL;
> +
> +	return info->id ? 0 : -EINVAL;
> +}
> +
> +static bool
> +cgroup_mt(const struct sk_buff *skb, struct xt_action_param *par)
> +{
> +	const struct xt_cgroup_info *info = par->matchinfo;
> +
> +	if (skb->sk == NULL)
> +		return false;
> +
> +	return (info->id == skb->sk->sk_cgrp_fwid) ^ info->invert;
> +}
> +
> +static struct xt_match cgroup_mt_reg __read_mostly = {
> +	.name       = "cgroup",
> +	.revision   = 0,
> +	.family     = NFPROTO_UNSPEC,
> +	.checkentry = cgroup_mt_check,
> +	.match      = cgroup_mt,
> +	.matchsize  = sizeof(struct xt_cgroup_info),
> +	.me         = THIS_MODULE,
> +	.hooks      = (1 << NF_INET_LOCAL_OUT) |
> +	              (1 << NF_INET_POST_ROUTING),
> +};
> +
> +static inline struct cgroup_nf_state *
> +css_nf_state(struct cgroup_subsys_state *css)
> +{
> +	return css ? container_of(css, struct cgroup_nf_state, css) : NULL;
> +}
> +
> +static struct cgroup_subsys_state *
> +cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> +{
> +	struct cgroup_nf_state *cs;
> +
> +	cs = kzalloc(sizeof(*cs), GFP_KERNEL);
> +	if (!cs)
> +		return ERR_PTR(-ENOMEM);
> +
> +	return &cs->css;
> +}
> +
> +static int cgroup_css_online(struct cgroup_subsys_state *css)
> +{
> +	struct cgroup_nf_state *cs = css_nf_state(css);
> +	struct cgroup_nf_state *parent = css_nf_state(css_parent(css));
> +
> +	if (parent)
> +		cs->fwid = parent->fwid;
> +
> +	return 0;
> +}
> +
> +static void cgroup_css_free(struct cgroup_subsys_state *css)
> +{
> +	kfree(css_nf_state(css));
> +}
> +
> +static int cgroup_fwid_update(const void *v, struct file *file, unsigned n)
> +{
> +	int err;
> +	struct socket *sock = sock_from_file(file, &err);
> +
> +	if (sock)
> +		sock->sk->sk_cgrp_fwid = (u32)(unsigned long) v;
> +
> +	return 0;
> +}
> +
> +static u64 cgroup_fwid_read(struct cgroup_subsys_state *css,
> +			    struct cftype *cft)
> +{
> +	return css_nf_state(css)->fwid;
> +}
> +
> +static int cgroup_fwid_write(struct cgroup_subsys_state *css,
> +			     struct cftype *cft, u64 id)
> +{
> +	css_nf_state(css)->fwid = (u32) id;
> +
> +	return 0;
> +}
> +
> +static void cgroup_attach(struct cgroup_subsys_state *css,
> +			  struct cgroup_taskset *tset)
> +{
> +	struct cgroup_nf_state *cs = css_nf_state(css);
> +	void *v = (void *)(unsigned long) cs->fwid;
> +	struct task_struct *p;
> +
> +	cgroup_taskset_for_each(p, css, tset) {
> +		task_lock(p);
> +		iterate_fd(p->files, 0, cgroup_fwid_update, v);
> +		task_unlock(p);
> +	}
> +}
> +
> +static struct cftype net_filter_ss_files[] = {
> +	{
> +		.name		= "fwid",
> +		.read_u64	= cgroup_fwid_read,
> +		.write_u64	= cgroup_fwid_write,
> +	},
> +	{ }
> +};
> +
> +struct cgroup_subsys net_filter_subsys = {
> +	.name		= "net_filter",
> +	.css_alloc	= cgroup_css_alloc,
> +	.css_online	= cgroup_css_online,
> +	.css_free	= cgroup_css_free,
> +	.attach		= cgroup_attach,
> +	.subsys_id	= net_filter_subsys_id,
> +	.base_cftypes	= net_filter_ss_files,
> +	.module		= THIS_MODULE,
> +};
> +
> +static int __init cgroup_mt_init(void)
> +{
> +	int ret = cgroup_load_subsys(&net_filter_subsys);
> +	if (ret)
> +		goto out;
> +
> +	ret = xt_register_match(&cgroup_mt_reg);
> +	if (ret)
> +		cgroup_unload_subsys(&net_filter_subsys);
> +out:
> +	return ret;
> +}
> +
> +static void __exit cgroup_mt_exit(void)
> +{
> +	xt_unregister_match(&cgroup_mt_reg);
> +	cgroup_unload_subsys(&net_filter_subsys);
> +}
> +
> +module_init(cgroup_mt_init);
> +module_exit(cgroup_mt_exit);
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ