lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 31 Jul 2018 14:46:46 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Toshiaki Makita <makita.toshiaki@....ntt.co.jp>
Cc:     Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>, netdev@...r.kernel.org,
        Jakub Kicinski <jakub.kicinski@...ronome.com>,
        John Fastabend <john.fastabend@...il.com>,
        "Karlsson, Magnus" <magnus.karlsson@...el.com>,
        Björn Töpel 
        <bjorn.topel@...el.com>, brouer@...hat.com
Subject: Re: [PATCH v6 bpf-next 4/9] veth: Handle xdp_frames in xdp napi
 ring

On Tue, 31 Jul 2018 19:40:08 +0900
Toshiaki Makita <makita.toshiaki@....ntt.co.jp> wrote:

> On 2018/07/31 19:26, Jesper Dangaard Brouer wrote:
> > 
> > Context needed from: [PATCH v6 bpf-next 2/9] veth: Add driver XDP
> > 
> > On Mon, 30 Jul 2018 19:43:44 +0900
> > Toshiaki Makita <makita.toshiaki@....ntt.co.jp> wrote:
> >   
> >> +static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
> >> +				      int buflen)
> >> +{
> >> +	struct sk_buff *skb;
> >> +
> >> +	if (!buflen) {
> >> +		buflen = SKB_DATA_ALIGN(headroom + len) +
> >> +			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >> +	}
> >> +	skb = build_skb(head, buflen);
> >> +	if (!skb)
> >> +		return NULL;
> >> +
> >> +	skb_reserve(skb, headroom);
> >> +	skb_put(skb, len);
> >> +
> >> +	return skb;
> >> +}  
> > 
> > 
> > On Mon, 30 Jul 2018 19:43:46 +0900
> > Toshiaki Makita <makita.toshiaki@....ntt.co.jp> wrote:
> >   
> >> +static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv,
> >> +					struct xdp_frame *frame)
> >> +{
> >> +	int len = frame->len, delta = 0;
> >> +	struct bpf_prog *xdp_prog;
> >> +	unsigned int headroom;
> >> +	struct sk_buff *skb;
> >> +
> >> +	rcu_read_lock();
> >> +	xdp_prog = rcu_dereference(priv->xdp_prog);
> >> +	if (likely(xdp_prog)) {
> >> +		struct xdp_buff xdp;
> >> +		u32 act;
> >> +
> >> +		xdp.data_hard_start = frame->data - frame->headroom;
> >> +		xdp.data = frame->data;
> >> +		xdp.data_end = frame->data + frame->len;
> >> +		xdp.data_meta = frame->data - frame->metasize;
> >> +		xdp.rxq = &priv->xdp_rxq;
> >> +
> >> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);
> >> +
> >> +		switch (act) {
> >> +		case XDP_PASS:
> >> +			delta = frame->data - xdp.data;
> >> +			len = xdp.data_end - xdp.data;
> >> +			break;
> >> +		default:
> >> +			bpf_warn_invalid_xdp_action(act);
> >> +		case XDP_ABORTED:
> >> +			trace_xdp_exception(priv->dev, xdp_prog, act);
> >> +		case XDP_DROP:
> >> +			goto err_xdp;
> >> +		}
> >> +	}
> >> +	rcu_read_unlock();
> >> +
> >> +	headroom = frame->data - delta - (void *)frame;
> >> +	skb = veth_build_skb(frame, headroom, len, 0);  
> > 
> > Here you are adding an assumption that struct xdp_frame is always
> > located in-the-top of the packet-data area.  I tried hard not to add
> > such a dependency!  You can calculate the beginning of the frame from
> > the xdp_frame->data pointer.
> > 
> > Why not add such a dependency?  Because for AF_XDP zero-copy, we cannot
> > make such an assumption.  
> > 
> > Currently, when an RX-queue is in AF-XDP-ZC mode (MEM_TYPE_ZERO_COPY)
> > the packet will get dropped when calling convert_to_xdp_frame(), but as
> > the TODO comment indicated in convert_to_xdp_frame() this is not the
> > end-goal. 
> > 
> > The comment in convert_to_xdp_frame(), indicate we need a full
> > alloc+copy, but that is actually not necessary, if we can just use
> > another memory area for struct xdp_frame, and a pointer to data.  Thus,
> > allowing devmap-redir to work-ZC and allow cpumap-redir to do the copy
> > on the remote CPU.  
> 
> Thanks for pointing this out.
> Seems you are saying xdp_frame area is not reusable. That means we
> reduce usable headroom on every REDIRECT. I wanted to avoid this but
> actually it is impossible, right?

I'm not sure I understand fully...  has this something to do, with the
below memset?

When cpumap generate an SKB for the netstack, then we sacrifice/reduce
the SKB headroom available, by in convert_to_xdp_frame() reducing the
headroom by xdp_frame size.

 xdp_frame->headroom = headroom - sizeof(*xdp_frame)

In-order to avoid doing such memset of this area.  We are actually only
worried about exposing the 'data' pointer, thus we could just clear
that.  (See commit 6dfb970d3dbd, this is because Alexei is planing to
move from CAP_SYS_ADMIN to lesser privileged mode CAP_NET_ADMIN)

See commits:
 97e19cce05e5 ("bpf: reserve xdp_frame size in xdp headroom")
 6dfb970d3dbd ("xdp: avoid leaking info stored in frame data on page reuse")


> >> +	if (!skb) {
> >> +		xdp_return_frame(frame);
> >> +		goto err;
> >> +	}
> >> +
> >> +	memset(frame, 0, sizeof(*frame));

This memset can become a performance issue later, if we change the size
of xdp_frame. (e.g I'm considering to extend this with the DMA addr,
but I'm not sure about that scheme yet).

Currently sizeof(xdp_frame) == 32 bytes, and a memset of 32 bytes is
fast, due to compiler reasons.  Above 32 bytes are more expensive,
because the compiler translates this into a "rep stos" operation, which
is slower, as it needs to save some registers (to allow it to be
interrupted). See [1] for experiments.

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_memset.c

> >> +	skb->protocol = eth_type_trans(skb, priv->dev);
> >> +err:
> >> +	return skb;
> >> +err_xdp:
> >> +	rcu_read_unlock();
> >> +	xdp_return_frame(frame);
> >> +
> >> +	return NULL;
> >> +}  
> > 
> >   
> 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ