lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 14 Apr 2020 14:46:37 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     Saeed Mahameed <saeedm@...lanox.com>
Cc:     "toke@...hat.com" <toke@...hat.com>,
        "gtzalik@...zon.com" <gtzalik@...zon.com>,
        "ilias.apalodimas@...aro.org" <ilias.apalodimas@...aro.org>,
        "borkmann@...earbox.net" <borkmann@...earbox.net>,
        "alexander.duyck@...il.com" <alexander.duyck@...il.com>,
        "john.fastabend@...il.com" <john.fastabend@...il.com>,
        "akiyano@...zon.com" <akiyano@...zon.com>,
        "zorik@...zon.com" <zorik@...zon.com>,
        "alexei.starovoitov@...il.com" <alexei.starovoitov@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "jeffrey.t.kirsher@...el.com" <jeffrey.t.kirsher@...el.com>,
        "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
        "dsahern@...il.com" <dsahern@...il.com>,
        "lorenzo@...nel.org" <lorenzo@...nel.org>,
        "willemdebruijn.kernel@...il.com" <willemdebruijn.kernel@...il.com>,
        brouer@...hat.com, Steffen Klassert <steffen.klassert@...unet.com>,
        Willy Tarreau <w@....eu>
Subject: Re: [PATCH RFC v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow
 packet size

On Thu, 9 Apr 2020 03:31:14 +0000
Saeed Mahameed <saeedm@...lanox.com> wrote:

> On Wed, 2020-04-08 at 13:53 +0200, Jesper Dangaard Brouer wrote:
> > Finally, after all drivers have a frame size, allow BPF-helper
> > bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.
> >   
> 
> can you provide a list of usecases for why tail extension is necessary
> ?

Use-cases:
(1) IPsec / XFRM needs a tail extend[1][2].
(2) DNS-cache replies in XDP.
(3) HA-proxy ALOHA would need it to convert to XDP.
 
> and what do you have in mind as immediate use of bpf_xdp_adjust_tail()
> ? 

I guess Steffen Klassert's ipsec use-case(1) it the most immediate.

[1] http://vger.kernel.org/netconf2019_files/xfrm_xdp.pdf
[2] http://vger.kernel.org/netconf2019.html

> both cover letter and commit messages didn't list any actual use case..

Sorry about that.

> > Remember that helper/macro xdp_data_hard_end have reserved some
> > tailroom.  Thus, this helper makes sure that the BPF-prog don't have
> > access to this tailroom area.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> > ---
> >  include/uapi/linux/bpf.h |    4 ++--
> >  net/core/filter.c        |   18 ++++++++++++++++--
> >  2 files changed, 18 insertions(+), 4 deletions(-)
> > 
[... cut ...]
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 7628b947dbc3..4d58a147eed0 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -3422,12 +3422,26 @@ static const struct bpf_func_proto
> > bpf_xdp_adjust_head_proto = {
> >  
> >  BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
> >  {
> > +	void *data_hard_end = xdp_data_hard_end(xdp);
> >  	void *data_end = xdp->data_end + offset;
> >  
> > -	/* only shrinking is allowed for now. */
> > -	if (unlikely(offset >= 0))
> > +	/* Notice that xdp_data_hard_end have reserved some tailroom */
> > +	if (unlikely(data_end > data_hard_end))
> >  		return -EINVAL;
> >    
> 
> i don't know if i like this approach for couple of reasons.
> 
> 1. drivers will provide arbitrary frames_sz, which is normally larger
> than mtu, and could be a full page size, for XDP_TX action this can be
> problematic if xdp progs will allow oversized packets to get caught at
> the driver level..

We already check if MTU is exceeded for a specific device when we
redirect into this, see helper xdp_ok_fwd_dev().  For the XDP_TX case,
I guess some drivers bypass that check, which should be fixed. The
XDP_TX case is IMHO a place where we allow drivers do special
optimizations, thus drivers can choose to do something faster than
calling generic helper xdp_ok_fwd_dev().  
  
> 
> 2. xdp_data_hard_end(xdp) has a hardcoded assumption of the skb shinfo
> and it introduces a reverse dependency between xdp buff and skbuff 
> 
(I'll address this in another mail)

> both of the above can be solved if the drivers provided the max
> allowed frame size, already accounting for mtu and shinfo when setting
> xdp_buff.frame_sz at the driver level.

It seems we look at the problem from two different angles.  You have
the drivers perspective, while I have the network stacks perspective
(the XDP_PASS case).  The mlx5 driver treats XDP as a special case, by
hiding or confining xdp_buff to functions fairly deep in the
call-stack.  My goal is different (moving SKB out of drivers), I see
the xdp_buff/xdp_frame as the main packet object in the drivers, that
gets send up the network stack (after converting to xdp_frame) and
converted into SKB in core-code (yes, there is a long road-ahead). The
larger tailroom can be used by netstack in SKB-coalesce.

The next step is making xdp_buff (and xdp_frame) multi-buffer aware.
This is why I reserve room for skb_shared_info.  I have considered
reducing the size of xdp_buff.frame_sz, with sizeof(skb_shared_info),
but it got kind of ugly having this in each drivers.

I also considered having drivers setup a direct pointer to
{skb,xdp}_shared_info section in xdp_buff, because will make it more
flexible (for what I imagined Alexander Duyck want).  (But we can still
do/change that later, once we start work in multi-buffer code)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ