netdev - Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170214213157.32e37148@redhat.com>
Date:   Tue, 14 Feb 2017 21:31:57 +0100
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Tom Herbert <tom@...bertland.com>
Cc:     brouer@...hat.com, <netdev@...r.kernel.org>, <kernel-team@...com>
Subject: Re: [PATCH RFC v2 1/8] xdp: Infrastructure to generalize XDP

On Wed, 8 Feb 2017 15:41:20 -0800
Tom Herbert <tom@...bertland.com> wrote:

> +static inline int __xdp_run_one_hook(struct xdp_hook *hook,
> +				     struct xdp_buff *xdp)
> +{
> +	void *priv = rcu_dereference(hook->priv);
> +
> +	if (hook->is_bpf) {
> +		/* Run BPF programs directly do avoid one layer of
> +		 * indirection.
> +		 */
> +		return BPF_PROG_RUN((struct bpf_prog *)priv, (void *)xdp);
> +	} else {
> +		return hook->hookfn(priv, xdp);
> +	}
> +}
> +
> +/* Core function to run the XDP hooks. This must be as fast as possible */
> +static inline int __xdp_hook_run(struct xdp_hook_set *hook_set,
> +				 struct xdp_buff *xdp,
> +				 struct xdp_hook **last_hook)
> +{
> +	struct xdp_hook *hook;
> +	int i, ret;
> +
> +	if (unlikely(!hook_set))
> +		return XDP_PASS;
> +
> +	hook = &hook_set->hooks[0];
> +	ret = __xdp_run_one_hook(hook, xdp);
> +	*last_hook = hook;
> +
> +	for (i = 1; i < hook_set->num; i++) {
> +		if (ret != XDP_PASS)
> +			break;
> +		hook = &hook_set->hooks[i];
> +		ret = __xdp_run_one_hook(hook, xdp);
> +	}
> +
> +	return ret;
> +}

There is one basic problem with this approach.  There is no bulking and
no reuse of instruction cache.  There is no revolution in this approach.
We will end-up with the same known performance problems when more hook
users get added.

Calling N-number of hooks per every packet, will just end-up flushing
the instruction cache (like the issues we have today).

Instead take N-packets, and then call the hooks by turn (store action
verdicts in packet-vector).  Such an architecture would be inline with
that VPP, Snabb and DPDK is doing.  Optimizing icache usage, and opens
up for smarter prefetching of lookup tables.  Imagine, having hook-1
identify lookup bucket and start prefetch, hook-2 access the bucket and
prefetch table data, and hook-3 read data.  This is what DPDK is doing
see[1], and VPP is doing similar tricks to get it to scale to large
route lookup tables.

[1] http://dpdk.org/doc/guides/prog_guide/packet_framework.html#figure-figure35

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer