[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADa=ObraXt4uEckHAGuhpvBa3ReUgcQkFMQweSBrGU9zpoOwnA@mail.gmail.com>
Date: Thu, 27 Jun 2024 12:07:32 +0300
From: Николай Рыбалов <dairinin@...il.com>
To: netdev@...r.kernel.org
Subject: mlnx5_core xdp redirect errors
Hello,
I have a setup with 32 cpus and two mlnx5 nics, both running XDP
programs, one of which does redirect via devmap to another. This works
fine until the following happens:
1. Limit number of queues on both nics to 4 (< number of cpus)
2. Place incoming interrupt on a CPU >4 via irq_affinity
3. See redirect errors in trace:
<idle>-0 [001] ..s1. 2010.232028: xdp_redirect:
prog_id=58 action=REDIRECT ifindex=5 to_ifindex=4 err=0 map_id=44
map_index=0
<idle>-0 [001] ..s1. 2010.232033: xdp_devmap_xmit:
ndo_xdp_xmit from_ifindex=5 to_ifindex=4 action=REDIRECT sent=1
drops=0 err=0
<idle>-0 [005] ..s1. 2010.232253: xdp_redirect:
prog_id=56 action=REDIRECT ifindex=4 to_ifindex=5 err=0 map_id=44
map_index=1
<idle>-0 [005] ..s1. 2010.232257: xdp_devmap_xmit:
ndo_xdp_xmit from_ifindex=4 to_ifindex=5 action=REDIRECT sent=0
drops=1 err=-6
This narrows down to the code in mlx5_xdp_xmit that selects output
queue by smp cpu id, fails on cpu 5 and succeeds on cpu 1
The scenario is not very exotic to me, at least there is a need of not
running nic interrupts on all the cpus in the system, and not to be
bounded to first N of them.
Can this issue be solved in the driver, or I should start looking for
a workaround on the userland side?
Best regards
Powered by blists - more mailing lists