netdev - RE: [PATCH net-next RFC WIP] Patch for XDP support for virtio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 28 Oct 2016 20:51:28 -0700
From:   Shrijeet Mukherjee <shm@...ulusnetworks.com>
To:     Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Jakub Kicinski <kubakici@...pl>
Cc:     John Fastabend <john.fastabend@...il.com>,
        David Miller <davem@...emloft.net>, alexander.duyck@...il.com,
        mst@...hat.com, brouer@...hat.com, shrijeet@...il.com,
        tom@...bertland.com, netdev@...r.kernel.org,
        Roopa Prabhu <roopa@...ulusnetworks.com>,
        Nikolay Aleksandrov <nikolay@...ulusnetworks.com>
Subject: RE: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net

> -----Original Message-----
> From: Alexei Starovoitov [mailto:alexei.starovoitov@...il.com]
> Sent: Friday, October 28, 2016 11:22 AM
> To: Jakub Kicinski <kubakici@...pl>
> Cc: John Fastabend <john.fastabend@...il.com>; David Miller
> <davem@...emloft.net>; alexander.duyck@...il.com; mst@...hat.com;
> brouer@...hat.com; shrijeet@...il.com; tom@...bertland.com;
> netdev@...r.kernel.org; shm@...ulusnetworks.com;
> roopa@...ulusnetworks.com; nikolay@...ulusnetworks.com
> Subject: Re: [PATCH net-next RFC WIP] Patch for XDP support for
virtio_net
>
> On Fri, Oct 28, 2016 at 05:18:12PM +0100, Jakub Kicinski wrote:
> > On Fri, 28 Oct 2016 08:56:35 -0700, John Fastabend wrote:
> > > On 16-10-27 07:10 PM, David Miller wrote:
> > > > From: Alexander Duyck <alexander.duyck@...il.com>
> > > > Date: Thu, 27 Oct 2016 18:43:59 -0700
> > > >
> > > >> On Thu, Oct 27, 2016 at 6:35 PM, David Miller
> <davem@...emloft.net> wrote:
> > > >>> From: "Michael S. Tsirkin" <mst@...hat.com>
> > > >>> Date: Fri, 28 Oct 2016 01:25:48 +0300
> > > >>>
> > > >>>> On Thu, Oct 27, 2016 at 05:42:18PM -0400, David Miller wrote:
> > > >>>>> From: "Michael S. Tsirkin" <mst@...hat.com>
> > > >>>>> Date: Fri, 28 Oct 2016 00:30:35 +0300
> > > >>>>>
> > > >>>>>> Something I'd like to understand is how does XDP address the
> > > >>>>>> problem that 100Byte packets are consuming 4K of memory
> now.
> > > >>>>>
> > > >>>>> Via page pools.  We're going to make a generic one, but right
> > > >>>>> now each and every driver implements a quick list of pages to
> > > >>>>> allocate from (and thus avoid the DMA man/unmap overhead,
> > > >>>>> etc.)
> > > >>>>
> > > >>>> So to clarify, ATM virtio doesn't attempt to avoid dma
> > > >>>> map/unmap so there should be no issue with that even when using
> > > >>>> sub/page regions, assuming DMA APIs support sub-page
> map/unmap correctly.
> > > >>>
> > > >>> That's not what I said.
> > > >>>
> > > >>> The page pools are meant to address the performance degradation
> > > >>> from going to having one packet per page for the sake of XDP's
> > > >>> requirements.
> > > >>>
> > > >>> You still need to have one packet per page for correct XDP
> > > >>> operation whether you do page pools or not, and whether you have
> > > >>> DMA mapping (or it's equivalent virutalization operation) or
not.
> > > >>
> > > >> Maybe I am missing something here, but why do you need to limit
> > > >> things to one packet per page for correct XDP operation?  Most of
> > > >> the drivers out there now are usually storing something closer to
> > > >> at least 2 packets per page, and with the DMA API fixes I am
> > > >> working on there should be no issue with changing the contents
> > > >> inside those pages since we won't invalidate or overwrite the
> > > >> data after the DMA buffer has been synchronized for use by the
CPU.
> > > >
> > > > Because with SKB's you can share the page with other packets.
> > > >
> > > > With XDP you simply cannot.
> > > >
> > > > It's software semantics that are the issue.  SKB frag list pages
> > > > are read only, XDP packets are writable.
> > > >
> > > > This has nothing to do with "writability" of the pages wrt. DMA
> > > > mapping or cpu mappings.
> > > >
> > >
> > > Sorry I'm not seeing it either. The current xdp_buff is defined by,
> > >
> > >   struct xdp_buff {
> > > 	void *data;
> > > 	void *data_end;
> > >   };
> > >
> > > The verifier has an xdp_is_valid_access() check to ensure we don't
> > > go past data_end. The page for now at least never leaves the driver.
> > > For the work to get xmit to other devices working I'm still not sure
> > > I see any issue.
> >
> > +1
> >
> > Do we want to make the packet-per-page a requirement because it could
> > be useful in the future from architectural standpoint?  I guess there
> > is a trade-off here between having the comfort of people following
> > this requirement today and making driver support for XDP even more
> complex.
>
> It looks to me that packet per page makes drivers simpler instead of
> complex.
> ixgbe split-page and mlx* order-3/5 tricks are definitely complex.
> The skb truesize concerns come from the host when data is delivered to
> user space and we need to have precise memory accounting for different
> applications and different users. XDP is all root and imo it's far away
from
> the days when multi-user non-root issues start to pop up.
> At the same time XDP doesn't require to use 4k buffer in something like
> Netronome.
> If xdp bpf program can be offloaded into HW with 1800 byte buffers,
great!
> For x86 cpu the 4k byte is a natural allocation chunk. Anything lower
> requires delicate dma tricks paired with even more complex slab
allocator
> and atomic recnts.
> All of that can only drive the performance down.
> Comparing to kernel bypass xdp is using 4k pages whereas dpdk has to use
> huge pages, so xdp is saving a ton of memory already!

Generally agree, but SRIOV nics with multiple queues can end up in a bad
spot if each buffer was 4K right ? I see a specific page pool to be used
by queues which are enabled for XDP as the easiest to swing solution that
way the memory overhead can be restricted to enabled queues and shared
access issues can be restricted to skb's using that pool no ?

Clearly as said later this will not apply to ebpf offload devices ..