netdev - Re: [PATCH v2 net-next 6/9] bpf: helpers: add bpf_xdp_adjust_mb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5f57e23e513b2_10343208e0@john-XPS-13-9370.notmuch>
Date:   Tue, 08 Sep 2020 12:57:50 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     Lorenzo Bianconi <lorenzo@...nel.org>,
        John Fastabend <john.fastabend@...il.com>
Cc:     netdev@...r.kernel.org, bpf@...r.kernel.org, davem@...emloft.net,
        lorenzo.bianconi@...hat.com, brouer@...hat.com,
        echaudro@...hat.com, sameehj@...zon.com, kuba@...nel.org,
        daniel@...earbox.net, ast@...nel.org, shayagr@...zon.com,
        edumazet@...gle.com
Subject: Re: [PATCH v2 net-next 6/9] bpf: helpers: add
 bpf_xdp_adjust_mb_header helper

Lorenzo Bianconi wrote:
> > Lorenzo Bianconi wrote:
> > > > Lorenzo Bianconi wrote:
> 
> [...]
> 
> > > > > + *	Description
> > > > > + *		Adjust frame headers moving *offset* bytes from/to the second
> > > > > + *		buffer to/from the first one. This helper can be used to move
> > > > > + *		headers when the hw DMA SG does not copy all the headers in
> > > > > + *		the first fragment.
> > > 
> > > + Eric to the discussion
> > > 

[...]

> > > > > +BPF_CALL_2(bpf_xdp_adjust_mb_header, struct  xdp_buff *, xdp,
> > > > > +	   int, offset)
> > > > > +{
> > > > > +	void *data_hard_end, *data_end;
> > > > > +	struct skb_shared_info *sinfo;
> > > > > +	int frag_offset, frag_len;
> > > > > +	u8 *addr;
> > > > > +
> > > > > +	if (!xdp->mb)
> > > > > +		return -EOPNOTSUPP;
> > 
> > Not required for this patch necessarily but I think it would be better user
> > experience if instead of EOPNOTSUPP here we did the header split. This
> > would allocate a frag and copy the bytes around as needed. Yes it might
> > be slow if you don't have a frag free in the driver, but if user wants to
> > do header split and their hardware can't do it we would have a way out.
> > 
> > I guess it could be an improvement for later though.
> 
> I have no a strong opinion on this, I did it in this way to respect the rule "we
> do not allocate memory for XDP".
> 
> @Jesper, David: thoughts?

Consider adding a flags field to the helper so we could do this later with
a flag. Then users who want the alloc can set the flag and get it.

[...]

> 
> > 
> > How/when does the header split go wrong on the mvneta device? I guess
> > this is to fix a real bug/issue not some theoritical one? An example
> > in the commit message would make this concrete. Soemthing like,
> > "When using RX zerocopy to mmap data into userspace application if
> > a packet with [all these wild headers] is received rx zerocopy breaks
> > because header split puts headers X in the data frag confusing apps".
> 
> This issue does not occur with mvneta since the driver is not capable of
> performing header split AFAIK. The helper has been introduced to cover the
> "issue" reported by Eric in his NetDevConf presentation. In order to test the
> helper I modified the mventa rx napi loop in a controlled way (this patch can't
> be sent upstream, it is for testing only :))
> I will improve commit message in v3.

Ah ok so really there is no users for the helper then IMO just drop
the patch until we have a user then.

> 
> > 
> > > 
> > > > 
> > > > Also and even more concerning I think this API requires the
> > > > driver to populate shinfo. If we use TX_REDIRECT a lot or TX_XMIT
> > > > this means we need to populate shinfo when its probably not ever
> > > > used. If our driver is smart L2/L3 headers are in the readable
> > > > data and prefetched. Writing frags into the end of a page is likely
> > > > not free.
> > > 
> > > Sorry I did not get what you mean with "populate shinfo" here. We need to
> > > properly set shared_info in order to create the xdp multi-buff.
> > > Apart of header splits, please consider the main uses-cases for
> > > xdp multi-buff are XDP with TSO and Jumbo frames.
> > 
> > The use case I have in mind is a XDP_TX or XDP_REDIRECT load balancer.
> > I wont know this at the driver level and now I'll have to write into
> > the back of every page with this shinfo just in case. If header
> > split is working I should never need to even touch the page outside
> > the first N bytes that were DMAd and in cache with DDIO. So its extra
> > overhead for something that is unlikely to happen in the LB case.
> 
> So far the skb_shared_info in constructed in mvneta only if the hw splits
> the received data in multiple buffers (so if the MTU is greater than 1 PAGE,
> please see comments below). Moreover the shared_info is present only in the
> first buffer.

Still in a normal L2/L3/L4 use case I expect all the headers you
need to be in the fist buffer so its unlikely for use cases that
send most traffic via XDP_TX for example to ever need the extra
info. In these cases I think you are paying some penalty for
having to do the work of populating the shinfo. Maybe its measurable
maybe not I'm not sure.

Also if we make it required for multi-buffer than we also need
the shinfo on 40gbps or 100gbps nics and now even small costs
matter.

> 
> > 
> > If you take the simplest possible program that just returns XDP_TX
> > and run a pkt generator against it. I believe (haven't run any
> > tests) that you will see overhead now just from populating this
> > shinfo. I think it needs to only be done when its needed e.g. when
> > user makes this helper call or we need to build the skb and populate
> > the frags there.
> 
> sure, I will carry out some tests.

Thanks!

> 
> > 
> > I think a smart driver will just keep the frags list in whatever
> > form it has them (rx descriptors?) and push them over to the
> > tx descriptors without having to do extra work with frag lists.
> 
> I think there are many use-cases where we want to have this info available in
> xdp_buff/xdp_frame. E.g: let's consider the following Jumbo frame example:
> - MTU > 1 PAGE (so we the driver will split the received data in multiple rx
>   descriptors)
> - the driver performs a XDP_REDIRECT to a veth or cpumap
> 
> Relying on the proposed architecture we could enable GRO in veth or cpumap I
> guess since we can build a non-linear skb from the xdp multi-buff, right?

I'm not disputing there are use-cases. But, I'm trying to see if we
can cover those without introducing additional latency in other
cases. Hence the extra benchmarks request ;)

> 
> > 
> > > 
> > > > 
> > > > Did you benchmark this?
> > > 
> > > will do, I need to understand if we can use tiny buffers in mvneta.
> > 
> > Why tiny buffers? How does mvneta layout the frags when doing
> > header split? Can we just benchmark what mvneta is doing at the
> > end of this patch series?
> 
> for the moment mvneta can split the received data when the previous buffer is
> full (e.g. when we the first page is completely written). I want to explore if
> I can set a tiny buffer (e.g. 128B) as max received buffer to run some performance
> tests and have some "comparable" results respect to the ones I got when I added XDP
> support to mvneta.

OK would be great.

> 
> > 
> > Also can you try the basic XDP_TX case mentioned above.
> > I don't want this to degrade existing use cases if at all
> > possible.
> 
> sure, will do.

Thanks!