[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <d862a131-5e31-bd26-84f7-fd8764ca9d48@redhat.com>
Date: Tue, 23 May 2023 17:55:07 +0200
From: Jesper Dangaard Brouer <jbrouer@...hat.com>
To: Dragos Tatulea <dtatulea@...dia.com>, Saeed Mahameed <saeed@...nel.org>,
Saeed Mahameed <saeedm@...dia.com>, Tariq Toukan <tariqt@...dia.com>,
Tariq Toukan <ttoukan.linux@...il.com>, Netdev <netdev@...r.kernel.org>,
Yunsheng Lin <linyunsheng@...wei.com>
Cc: brouer@...hat.com, atzin@...hat.com, mkabat@...hat.com, kheib@...hat.com,
Jiri Benc <jbenc@...hat.com>, bpf <bpf@...r.kernel.org>,
Felix Maurer <fmaurer@...hat.com>,
Alexander Duyck <alexander.duyck@...il.com>,
Ilias Apalodimas <ilias.apalodimas@...aro.org>,
Lorenzo Bianconi <lorenzo@...nel.org>,
Maxim Mikityanskiy <maxtram95@...il.com>
Subject: mlx5 XDP redirect leaking memory on kernel 6.3
When the mlx5 driver runs an XDP program doing XDP_REDIRECT, then memory
is getting leaked. Other XDP actions, like XDP_DROP, XDP_PASS and XDP_TX
works correctly. I tested both redirecting back out same mlx5 device and
cpumap redirect (with XDP_PASS), which both cause leaking.
After removing the XDP prog, which also cause the page_pool to be
released by mlx5, then the leaks are visible via the page_pool periodic
inflight reports. I have this bpftrace[1] tool that I also use to detect
the problem faster (not waiting 60 sec for a report).
[1]
https://github.com/xdp-project/xdp-project/blob/master/areas/mem/bpftrace/page_pool_track_shutdown01.bt
I've been debugging and reading through the code for a couple of days,
but I've not found the root-cause, yet. I would appreciate new ideas
where to look and fresh eyes on the issue.
To Lin, it looks like mlx5 uses PP_FLAG_PAGE_FRAG, and my current
suspicion is that mlx5 driver doesn't fully release the bias count (hint
see MLX5E_PAGECNT_BIAS_MAX).
--Jesper
Extra info about my device. Providing these as mlx5 driver can have
different allocation modes depending on HW and device priv-flags setup.
$ ethtool --show-priv-flags mlx5p1
Private flags for mlx5p1:
rx_cqe_moder : on
tx_cqe_moder : off
rx_cqe_compress : off
rx_striding_rq : on
rx_no_csum_complete: off
xdp_tx_mpwqe : on
skb_tx_mpwqe : on
tx_port_ts : off
$ ethtool -i mlx5p1
driver: mlx5_core
version: 6.4.0-rc2-net-next-vm-lock-dbg+
firmware-version: 16.23.1020 (MT_0000000009)
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: yes
$ lspci -v | grep 03:00.0
03:00.0 Ethernet controller: Mellanox Technologies MT28800 Family
[ConnectX-5 Ex]
Powered by blists - more mailing lists