[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJ8uoz2sud5wBFpWKfC-i4n9E77KnQWGPsKLysumsA23pHEjEQ@mail.gmail.com>
Date: Thu, 9 Jul 2020 08:54:50 +0200
From: Magnus Karlsson <magnus.karlsson@...il.com>
To: Maxim Mikityanskiy <maximmi@...lanox.com>
Cc: Magnus Karlsson <magnus.karlsson@...el.com>,
Björn Töpel <bjorn.topel@...el.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Network Development <netdev@...r.kernel.org>,
Jonathan Lemon <jonathan.lemon@...il.com>,
bpf <bpf@...r.kernel.org>, jeffrey.t.kirsher@...el.com,
"Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>,
Maciej Fijalkowski <maciejromanfijalkowski@...il.com>,
cristian.dumitrescu@...el.com
Subject: Re: [PATCH bpf-next 00/14] xsk: support shared umems between devices
and queues
On Wed, Jul 8, 2020 at 5:02 PM Maxim Mikityanskiy <maximmi@...lanox.com> wrote:
>
> On 2020-07-02 15:18, Magnus Karlsson wrote:
> > This patch set adds support to share a umem between AF_XDP sockets
> > bound to different queue ids on the same device or even between
> > devices. It has already been possible to do this by registering the
> > umem multiple times, but this wastes a lot of memory. Just imagine
> > having 10 threads each having 10 sockets open sharing a single
> > umem. This means that you would have to register the umem 100 times
> > consuming large quantities of memory.
>
> Just to clarify: the main memory savings are achieved, because we don't
> need to store an array of pages in struct xdp_umem multiple times, right?
>
> I guess there is one more drawback of sharing a UMEM the old way
> (register it multiple times): it would map (DMA) the same pages multiple
> times.
Both are correct. The main saving is from only having to DMA map the
device once, your second comment.
> > Instead, we extend the existing XDP_SHARED_UMEM flag to also work when
> > sharing a umem between different queue ids as well as devices. If you
> > would like to share umem between two sockets, just create the first
> > one as would do normally. For the second socket you would not register
> > the same umem using the XDP_UMEM_REG setsockopt. Instead attach one
> > new fill ring and one new completion ring to this second socket and
> > then use the XDP_SHARED_UMEM bind flag supplying the file descriptor of
> > the first socket in the sxdp_shared_umem_fd field to signify that it
> > is the umem of the first socket you would like to share.
> >
> > One important thing to note in this example, is that there needs to be
> > one fill ring and one completion ring per unique device and queue id
> > bound to. This so that the single-producer and single-consumer semantics
> > of the rings can be upheld. To recap, if you bind multiple sockets to
> > the same device and queue id (already supported without this patch
> > set), you only need one pair of fill and completion rings. If you bind
> > multiple sockets to multiple different queues or devices, you need one
> > fill and completion ring pair per unique device,queue_id tuple.
> >
> > The implementation is based around extending the buffer pool in the
> > core xsk code. This is a structure that exists on a per unique device
> > and queue id basis. So, a number of entities that can now be shared
> > are moved from the umem to the buffer pool. Information about DMA
> > mappings are also moved from the buffer pool, but as these are per
> > device independent of the queue id, they are now hanging off the
> > netdev.
>
> Basically, you want to map a pair of (netdev, UMEM) to DMA info. The
> current implementation of xp_find_dma_map stores a list of UMEMs in the
> netdev and goes over that list to find the corresponding DMA info. It
> would be more effective to do it vice-versa, i.e. to store the list of
> netdevs inside of a UMEM, because you normally have fewer netdevs in the
> system than sockets, and you'll have fewer list items to traverse. Of
> course, it has no effect on the data path, but it will improve the time
> to open a socket (i.e. connection rate).
Good idea. Will fix.
> > In summary after this patch set, there is one xdp_sock struct
> > per socket created. This points to an xsk_buff_pool for which there is
> > one per unique device and queue id. The buffer pool points to a DMA
> > mapping structure for which there is one per device that a umem has
> > been bound to. And finally, the buffer pool also points to a xdp_umem
> > struct, for which there is only one per umem registration.
> >
> > Before:
> >
> > XSK -> UMEM -> POOL
> >
> > Now:
> >
> > XSK -> POOL -> DMA
> > \
> > > UMEM
> >
> > Patches 1-8 only rearrange internal structures to support the buffer
> > pool carrying this new information, while patch 9 improves performance
> > as we now have rearrange the internal structures quite a bit. Finally,
> > patches 10-14 introduce the new functionality together with libbpf
> > support, samples, and documentation.
> >
> > Libbpf has also been extended to support sharing of umems between
> > sockets bound to different devices and queue ids by introducing a new
> > function called xsk_socket__create_shared(). The difference between
> > this and the existing xsk_socket__create() is that the former takes a
> > reference to a fill ring and a completion ring as these need to be
> > created. This new function needs to be used for the second and
> > following sockets that binds to the same umem. The first one can be
> > created by either function as it will also have called
> > xsk_umem__create().
> >
> > There is also a new sample xsk_fwd that demonstrates this new
> > interface and capability.
> >
> > Note to Maxim at Mellanox. I do not have a mlx5 card, so I have not
> > been able to test the changes to your driver. It compiles, but that is
> > all I can say, so it would be great if you could test it. Also, I did
> > change the name of many functions and variables from umem to pool as a
> > buffer pool is passed down to the driver in this patch set instead of
> > the umem. I did not change the name of the files umem.c and
> > umem.h. Please go through the changes and change things to your
> > liking.
>
> I looked through the mlx5 patches, and I see the changes are minor, and
> most importantly, the functionality is not broken (tested with xdpsock).
> I would still like to make some cosmetic amendments - I'll send you an
> updated patch.
Appreciated. Thanks.
> > Performance for the non-shared umem case is unchanged for the xdpsock
> > sample application with this patch set.
>
> I also tested it on mlx5 (ConnectX-5 Ex), and the performance hasn't
> been hurt.
Good to hear. I might include another patch in the v2 that improves
performance with 3% for the l2fwd sample app on my system. It is in
common code so should benefit everyone. Though, it is dependent on a
new DMA interface patch set from Christof making it from bpf to
bpf-next. If it makes it over in time, I will include it. Otherwise,
it will be submitted later.
/Magnus
> > For workloads that share a
> > umem, this patch set can give rise to added performance benefits due
> > to the decrease in memory usage.
> >
> > This patch has been applied against commit 91f77560e473 ("Merge branch 'test_progs-improvements'")
> >
> > Structure of the patch set:
> >
> > Patch 1: Pass the buffer pool to the driver instead of the umem. This
> > because the driver needs one buffer pool per napi context
> > when we later introduce sharing of the umem between queue ids
> > and devices.
> > Patch 2: Rename the xsk driver interface so they have better names
> > after the move to the buffer pool
> > Patch 3: There is one buffer pool per device and queue, while there is
> > only one umem per registration. The buffer pool needs to be
> > created and destroyed independently of the umem.
> > Patch 4: Move fill and completion rings to the buffer pool as there will
> > be one set of these per device and queue
> > Patch 5: Move queue_id, dev and need_wakeup to buffer pool again as these
> > will now be per buffer pool as the umem can be shared between
> > devices and queues
> > Patch 6: Move xsk_tx_list and its lock to buffer pool
> > Patch 7: Move the creation/deletion of addrs from buffer pool to umem
> > Patch 8: Enable sharing of DMA mappings when multiple queues of the
> > same device are bound
> > Patch 9: Rearrange internal structs for better performance as these
> > have been substantially scrambled by the previous patches
> > Patch 10: Add shared umem support between queue ids
> > Patch 11: Add shared umem support between devices
> > Patch 12: Add support for this in libbpf
> > Patch 13: Add a new sample that demonstrates this new feature by
> > forwarding packets between different netdevs and queues
> > Patch 14: Add documentation
> >
> > Thanks: Magnus
> >
> > Cristian Dumitrescu (1):
> > samples/bpf: add new sample xsk_fwd.c
> >
> > Magnus Karlsson (13):
> > xsk: i40e: ice: ixgbe: mlx5: pass buffer pool to driver instead of
> > umem
> > xsk: i40e: ice: ixgbe: mlx5: rename xsk zero-copy driver interfaces
> > xsk: create and free context independently from umem
> > xsk: move fill and completion rings to buffer pool
> > xsk: move queue_id, dev and need_wakeup to context
> > xsk: move xsk_tx_list and its lock to buffer pool
> > xsk: move addrs from buffer pool to umem
> > xsk: net: enable sharing of dma mappings
> > xsk: rearrange internal structs for better performance
> > xsk: add shared umem support between queue ids
> > xsk: add shared umem support between devices
> > libbpf: support shared umems between queues and devices
> > xsk: documentation for XDP_SHARED_UMEM between queues and netdevs
> >
> > Documentation/networking/af_xdp.rst | 68 +-
> > drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 +-
> > drivers/net/ethernet/intel/i40e/i40e_main.c | 29 +-
> > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 10 +-
> > drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 +-
> > drivers/net/ethernet/intel/i40e/i40e_xsk.c | 79 +-
> > drivers/net/ethernet/intel/i40e/i40e_xsk.h | 4 +-
> > drivers/net/ethernet/intel/ice/ice.h | 18 +-
> > drivers/net/ethernet/intel/ice/ice_base.c | 16 +-
> > drivers/net/ethernet/intel/ice/ice_lib.c | 2 +-
> > drivers/net/ethernet/intel/ice/ice_main.c | 10 +-
> > drivers/net/ethernet/intel/ice/ice_txrx.c | 8 +-
> > drivers/net/ethernet/intel/ice/ice_txrx.h | 2 +-
> > drivers/net/ethernet/intel/ice/ice_xsk.c | 142 +--
> > drivers/net/ethernet/intel/ice/ice_xsk.h | 7 +-
> > drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 +-
> > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 34 +-
> > .../net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 7 +-
> > drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 61 +-
> > drivers/net/ethernet/mellanox/mlx5/core/en.h | 19 +-
> > drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 5 +-
> > .../net/ethernet/mellanox/mlx5/core/en/xsk/rx.h | 10 +-
> > .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.c | 12 +-
> > .../net/ethernet/mellanox/mlx5/core/en/xsk/setup.h | 2 +-
> > .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.c | 12 +-
> > .../net/ethernet/mellanox/mlx5/core/en/xsk/tx.h | 6 +-
> > .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.c | 108 +-
> > .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.h | 14 +-
> > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 46 +-
> > drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 16 +-
> > include/linux/netdevice.h | 13 +-
> > include/net/xdp_sock.h | 28 +-
> > include/net/xdp_sock_drv.h | 115 ++-
> > include/net/xsk_buff_pool.h | 47 +-
> > net/core/dev.c | 3 +
> > net/ethtool/channels.c | 2 +-
> > net/ethtool/ioctl.c | 2 +-
> > net/xdp/xdp_umem.c | 221 +---
> > net/xdp/xdp_umem.h | 6 -
> > net/xdp/xsk.c | 213 ++--
> > net/xdp/xsk.h | 3 +
> > net/xdp/xsk_buff_pool.c | 314 +++++-
> > net/xdp/xsk_diag.c | 14 +-
> > net/xdp/xsk_queue.h | 12 +-
> > samples/bpf/Makefile | 3 +
> > samples/bpf/xsk_fwd.c | 1075 ++++++++++++++++++++
> > tools/lib/bpf/libbpf.map | 1 +
> > tools/lib/bpf/xsk.c | 376 ++++---
> > tools/lib/bpf/xsk.h | 9 +
> > 49 files changed, 2327 insertions(+), 883 deletions(-)
> > create mode 100644 samples/bpf/xsk_fwd.c
> >
> > --
> > 2.7.4
> >
>
Powered by blists - more mailing lists