[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181109085249.462d8ce7@redhat.com>
Date: Fri, 9 Nov 2018 08:52:49 +0100
From: Jesper Dangaard Brouer <brouer@...hat.com>
To: Saeed Mahameed <saeedm@...lanox.com>
Cc: "dsahern@...il.com" <dsahern@...il.com>,
"pstaszewski@...are.pl" <pstaszewski@...are.pl>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"yoel@...knet.dk" <yoel@...knet.dk>, brouer@...hat.com,
John Fastabend <john.fastabend@...il.com>,
Tariq Toukan <tariqt@...lanox.com>,
Toke Høiland-Jørgensen <toke@...e.dk>
Subject: Re: Kernel 4.19 network performance - forwarding/routing normal
users traffic
On Fri, 9 Nov 2018 04:52:01 +0000
Saeed Mahameed <saeedm@...lanox.com> wrote:
> On Thu, 2018-11-08 at 17:42 -0700, David Ahern wrote:
> > On 11/8/18 5:40 PM, Paweł Staszewski wrote:
> > >
> > > W dniu 08.11.2018 o 17:32, David Ahern pisze:
> > > > On 11/8/18 9:27 AM, Paweł Staszewski wrote:
> > > > > > > What hardware is this?
> > > > > > >
> > > > > mellanox connectx 4
> > > > > ethtool -i enp175s0f0
> > > > > driver: mlx5_core
> > > > > version: 5.0-0
> > > > > firmware-version: 12.21.1000 (SM_2001000001033)
> > > > > expansion-rom-version:
> > > > > bus-info: 0000:af:00.0
> > > > > supports-statistics: yes
> > > > > supports-test: yes
> > > > > supports-eeprom-access: no
> > > > > supports-register-dump: no
> > > > > supports-priv-flags: yes
> > > > >
> > > > > ethtool -i enp175s0f1
> > > > > driver: mlx5_core
> > > > > version: 5.0-0
> > > > > firmware-version: 12.21.1000 (SM_2001000001033)
> > > > > expansion-rom-version:
> > > > > bus-info: 0000:af:00.1
> > > > > supports-statistics: yes
> > > > > supports-test: yes
> > > > > supports-eeprom-access: no
> > > > > supports-register-dump: no
> > > > > supports-priv-flags: yes
> > > > >
> > > > > > > Start with:
> > > > > > >
> > > > > > > echo 1 > /sys/kernel/debug/tracing/events/xdp/enable
> > > > > > > cat /sys/kernel/debug/tracing/trace_pipe
> > > > > > cat /sys/kernel/debug/tracing/trace_pipe
> > > > > > <idle>-0 [045] ..s. 68469.467752:
> > > > > > xdp_devmap_xmit:
> > > > > > ndo_xdp_xmit map_id=32 map_index=5 action=REDIRECT sent=0
> > > > > > drops=1
> > > > > > from_ifindex=4 to_ifindex=5 err=-6
> > > > FIB lookup is good, the redirect is happening, but the mlx5
> > > > driver does
> > > > not like it.
> > > >
> > > > I think the -6 is coming from the mlx5 driver and the packet is
> > > > getting
> > > > dropped. Perhaps this check in mlx5e_xdp_xmit:
> > > >
> > > > if (unlikely(sq_num >= priv->channels.num))
> > > > return -ENXIO;
> > > I removed that part and recompiled - but after running now xdp_fwd
> > > i
> > > have kernel pamic :)
> >
>
> hh, no please don't do such thing :)
>
> It must be because the tx netdev has less tx queues than the rx netdev.
> or the rx netdev rings are bound to a high cpu indexes.
>
> anyway, best practice is to open #cores RX/TX netdev on both sides
>
> ethtool -L enp175s0f0 combined $(nproc)
> ethtool -L enp175s0f1 combined $(nproc)
>
> > Jesper or one of the Mellanox folks needs to respond about the config
> > needed to run XDP with this NIC. I don't have a 40G or 100G card to
> > play with.
Saeed already answered with a solution... you need to increase the
number of RX/TX queues to be equal to the number of CPUs.
IHMO this again shows that the resource allocations around ndo_xdp_xmit
needs a better API. The implicit requirement is that once ndo_xdp_xmit
is enabled the driver MUST allocate for each CPU a dedicated TX for
XDP. It seems for mlx5 that this is a manual process. And as Pawel
discovered it is hard to troubleshoot and only via tracepoints.
I think we need to do better in this area, both regarding usability and
more graceful handling when the HW doesn't have the resources. The
original requirement for a XDP-TX queue per CPU was necessary because
the ndo_xdp_xmit was only sending 1-packet at the time. After my
recent changes, the ndo_xdp_xmit can now send in bulks. Thus,
performance wise it is feasible to use an (array of) locks, if e.g. the
HW cannot allocated more TX-HW queues, or e.g. allow sysadm to set the
mode of operation (if the system as a hole have issues allocating TX
completion IRQs for all these queues).
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Powered by blists - more mailing lists