netdev - Re: [RFC PATCH v2 03/14] xsk: add umem fill queue support and mmap

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180412105431.41bd8d0a@redhat.com>
Date:   Thu, 12 Apr 2018 10:54:31 +0200
From:   Jesper Dangaard Brouer <brouer@...hat.com>
To:     "Karlsson, Magnus" <magnus.karlsson@...el.com>
Cc:     "Michael S. Tsirkin" <mst@...hat.com>,
        Björn Töpel <bjorn.topel@...il.com>,
        "Duyck, Alexander H" <alexander.h.duyck@...el.com>,
        "alexander.duyck@...il.com" <alexander.duyck@...il.com>,
        "john.fastabend@...il.com" <john.fastabend@...il.com>,
        "ast@...com" <ast@...com>,
        "willemdebruijn.kernel@...il.com" <willemdebruijn.kernel@...il.com>,
        "daniel@...earbox.net" <daniel@...earbox.net>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "michael.lundkvist@...csson.com" <michael.lundkvist@...csson.com>,
        "Brandeburg, Jesse" <jesse.brandeburg@...el.com>,
        "Singhai, Anjali" <anjali.singhai@...el.com>,
        "Zhang, Qi Z" <qi.z.zhang@...el.com>,
        "ravineet.singh@...csson.com" <ravineet.singh@...csson.com>,
        brouer@...hat.com
Subject: Re: [RFC PATCH v2 03/14] xsk: add umem fill queue support and mmap

On Thu, 12 Apr 2018 07:38:25 +0000
"Karlsson, Magnus" <magnus.karlsson@...el.com> wrote:

> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:mst@...hat.com]
> > Sent: Thursday, April 12, 2018 4:16 AM
> > To: Björn Töpel <bjorn.topel@...il.com>
> > Cc: Karlsson, Magnus <magnus.karlsson@...el.com>; Duyck, Alexander H
> > <alexander.h.duyck@...el.com>; alexander.duyck@...il.com;
> > john.fastabend@...il.com; ast@...com; brouer@...hat.com;
> > willemdebruijn.kernel@...il.com; daniel@...earbox.net;
> > netdev@...r.kernel.org; michael.lundkvist@...csson.com; Brandeburg,
> > Jesse <jesse.brandeburg@...el.com>; Singhai, Anjali
> > <anjali.singhai@...el.com>; Zhang, Qi Z <qi.z.zhang@...el.com>;
> > ravineet.singh@...csson.com
> > Subject: Re: [RFC PATCH v2 03/14] xsk: add umem fill queue support and
> > mmap
> > 
> > On Tue, Mar 27, 2018 at 06:59:08PM +0200, Björn Töpel wrote:  
> > > @@ -30,4 +31,18 @@ struct xdp_umem_reg {
> > >  	__u32 frame_headroom; /* Frame head room */  };
> > >
> > > +/* Pgoff for mmaping the rings */
> > > +#define XDP_UMEM_PGOFF_FILL_QUEUE	0x100000000
> > > +
> > > +struct xdp_queue {
> > > +	__u32 head_idx __attribute__((aligned(64)));
> > > +	__u32 tail_idx __attribute__((aligned(64))); };
> > > +
> > > +/* Used for the fill and completion queues for buffers */ struct
> > > +xdp_umem_queue {
> > > +	struct xdp_queue ptrs;
> > > +	__u32 desc[0] __attribute__((aligned(64))); };
> > > +
> > >  #endif /* _LINUX_IF_XDP_H */  
> > 
> > So IIUC it's a head/tail ring of 32 bit descriptors.
> > 
> > In my experience (from implementing ptr_ring) this implies that head/tail
> > cache lines bounce a lot between CPUs. Caching will help some. You are also
> > forced to use barriers to check validity which is slow on some architectures.
> > 
> > If instead you can use a special descriptor value (e.g. 0) as a valid signal,
> > things work much better:
> > 
> > - you read descriptor atomically, if it's not 0 it's fine
> > - same with write - write 0 to pass it to the other side
> > - there is a data dependency so no need for barriers (except on dec alpha)
> > - no need for power of 2 limitations, you can make it any size you like
> > - easy to resize too
> > 
> > architecture (if not implementation) would be shared with ptr_ring so some
> > of the optimization ideas like batched updates could be lifted from there.
> > 
> > When I was building ptr_ring, any head/tail design underperformed storing
> > valid flag with data itself. YMMV.

I fully agree with MST here. This is also my experience.  I even
dropped my own Array-based Lock-Free (ALF) queue implementation[1] in
favor of ptr_ring. (Where I try to amortize this cost by bulking, but
this cause the queue to become non-wait-free)

[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/include/linux/alf_queue.h

> I think you are definitely right in that there are ways in which
> we can improve performance here. That said, the current queue
> performs slightly better than the previous one we had that was
> more or less a copy of one of your first virtio 1.1 proposals
> from little over a year ago. It had bidirectional queues and a
> valid flag in the descriptor itself. The reason we abandoned this
> was not poor performance (it was good), but a need to go to
> unidirectional queues. Maybe I should have only changed that
> aspect and kept the valid flag.
> 
> Anyway, I will take a look at ptr_ring and run some experiments
> along the lines of what you propose to get some
> numbers. Considering your experience with these kind of
> structures, you are likely right. I just need to convince
> myself :-).

When benchmarking, be careful that you don't measure the "wrong"
queue situation.  When doing this kind of "overload" benchmarking, you
will likely create a situation where the queue is always full (which
hopefully isn't a production use-case).  In the almost/always full
queue situation, using the element values to sync-on (like MST propose)
will still cause the cache-line bouncing (that we want to avoid).

MST explain and have addressed this situation for ptr_ring in:
 commit fb9de9704775 ("ptr_ring: batch ring zeroing")
 https://git.kernel.org/torvalds/c/fb9de9704775

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer