[<prev] [next>] [day] [month] [year] [list]
Message-ID: <bd85c535-e823-4597-b064-c34b3346b131@hetzner-cloud.de>
Date: Wed, 23 Oct 2024 18:47:34 +0200
From: Tobias Böhm <tobias.boehm@...zner-cloud.de>
To: Michael Chan <michael.chan@...adcom.com>
Cc: netdev@...r.kernel.org
Subject: bnxt_en: XDP_REDIRECT DMA unmap warnings (bnxt_tx_int_xdp)
Hi,
when doing XDP redirects between a tap device and a bnxt_en interface
(BCM57414) the kernel logs DMA unmap warnings (see below).
When the issue is present (host NOT booted with iommu=pt) the
performance is massively decreased (~200Mbps instead of ~5Gbps for a
test where traffic is forwarded by XDP_REDIRECTs in both directions).
The warnings look very similar to the warning mentioned in the commit
message 8baeef7616d5 (bnxt_en: Fix double DMA unmapping for
XDP_REDIRECT). Instead of "bnxt_rx_xdp" they occur in "bnxt_tx_int_xdp",
though.
If the system is booted with "iommu=pt" the issue is not present. This
was also the case with the issue mentioned in the commit above.
Unfortunately, I don't have a simple reproducer for this issue yet. The
important parts are:
- Both interfaces do XDP_REDIRECTs to each other.
- It doesn't need to be much traffic, but there must be traffic
redirected in both directions simultaneously (1 Mbps was enough to
trigger the warnings immediately).
- There is traffic passing to/from the normal network stack on the
bnxt_en interface as well.
I can't tell yet, if the last point is necessary to trigger the issue.
When the warning occurred, there always was mixed traffic (XDP_REDIRECTs
and XDP_PASS and traffic sent from the system directly) on the bnxt_en
device.
So, my uneducated guess is, that I see the same issue that was fixed for
the receive path by the commit mentioned above, but for the transmit
path. What do you think? Is this possibly a similar DMA double unmap
issue? Please let me know if you any need further details.
Regards, Tobias
NIC: BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
firmware-version: 230.0.157.0/pkg 230.1.116.0
WARNING: CPU: 34 PID: 13927 at drivers/iommu/dma-iommu.c:840
__iommu_dma_unmap+0x15f/0x170
Modules linked in: tcp_diag udp_diag inet_diag vhost_net vhost
vhost_iotlb tap nvme_fabrics cdc_ether usbnet mii dummy overlay
nf_conntrack_netlink vxlan ip6_udp_tunnel udp_tunnel nfnetlink_cttimeout
openvswitch nsh nf_conncount psample ip6table_raw ip6table_filter
ip6table_mangle ip6_tables xt_CT iptable_raw xt_comment xt_LOG
nf_log_syslog xt_limit xt_tcpudp xt_state xt_conntrack xt_set
iptable_filter iptable_mangle algif_hash af_alg veth binfmt_misc
nls_iso8859_1 ip_set_list_set amd_atl intel_rapl_msr intel_rapl_common
amd64_edac edac_mce_amd kvm_amd kvm rapl ip_set_hash_net ip_set
nfnetlink ipmi_ssif spd5118 k10temp ipmi_si sp5100_tco ccp ipmi_devintf
ipmi_msghandler mac_hid sch_fq_codel br_netfilter bridge dm_multipath
nf_nat_ftp nf_conntrack_ftp scsi_dh_rdac nf_nat_sip scsi_dh_emc
nf_conntrack_sip scsi_dh_alua nf_nat_pptp nf_conntrack_pptp nf_nat
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tcp_htcp 8021q garp mrp stp
llc bonding tls efi_pstore ip_tables x_tables autofs4 btrfs
blake2b_generic raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx xor
raid6_pq raid0 dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio
libcrc32c raid1 raid10 dax_hmem cxl_acpi cxl_core crct10dif_pclmul
crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel
sha256_ssse3 sha1_ssse3 einj nvme ast nvme_core i2c_algo_bit i2c_piix4
nvme_auth bnxt_en ahci i2c_smbus xhci_pci libahci xhci_pci_renesas
aesni_intel crypto_simd cryptd
CPU: 34 UID: 998 PID: 13927 Comm: vhost-13917 Kdump: loaded Tainted: G
W 6.11.0-061100-generic #202409151536
Tainted: [W]=WARN
Hardware name: Lenovo ThinkSystem SR635 V3/SB27B09916, BIOS KAE120J-4.20
06/14/2024
RIP: 0010:__iommu_dma_unmap+0x15f/0x170
Code: a8 00 00 00 00 48 c7 45 b0 00 00 00 00 48 c7 45 c8 00 00 00 00 48
c7 45 a0 ff ff ff ff 4c 89 45 b8 4c 89 45 c0 e9 77 ff ff ff <0f> 0b e9
60 ff ff ff e8 65 d6 72 00 0f 1f 44 00 00 90 90 90 90 90
RSP: 0018:ff493d55c1120c20 EFLAGS: 00010206
RAX: 0000000000002000 RBX: 00000000fe1f4000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ff493d55c1120c88 R08: ff493d55c1120c40 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff9aa060c0 R12: 0000000000001000
R13: ff432e7ef65e8010 R14: ff493d55c1120c28 R15: ff432e7ec09d9c00
FS: 000072f92e1093c0(0000) GS:ff432f3b08700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c000a04000 CR3: 000800015b182005 CR4: 0000000000f71ef0
PKRU: 55555554
Call Trace:
<IRQ>
? srso_alias_return_thunk+0x5/0xfbef5
? show_trace_log_lvl+0x273/0x310
? show_trace_log_lvl+0x273/0x310
? iommu_dma_unmap_page+0x4b/0xe0
? show_regs.part.0+0x22/0x30
? show_regs.cold+0x8/0x10
? __iommu_dma_unmap+0x15f/0x170
? __warn.cold+0xa7/0x101
? __iommu_dma_unmap+0x15f/0x170
? report_bug+0x114/0x160
? handle_bug+0x51/0xa0
? exc_invalid_op+0x18/0x80
? asm_exc_invalid_op+0x1b/0x20
? __iommu_dma_unmap+0x15f/0x170
iommu_dma_unmap_page+0x4b/0xe0
dma_unmap_page_attrs+0x52/0x210
? srso_alias_return_thunk+0x5/0xfbef5
? xdp_return_frame+0x2e/0xd0
bnxt_tx_int_xdp+0x1a8/0x2d0 [bnxt_en]
__bnxt_poll_work_done+0x81/0x1a0 [bnxt_en]
bnxt_poll+0xcd/0x1e0 [bnxt_en]
? srso_alias_return_thunk+0x5/0xfbef5
__napi_poll+0x30/0x1a0
net_rx_action+0x20e/0x400
handle_softirqs+0xe7/0x340
__irq_exit_rcu+0xce/0xf0
irq_exit_rcu+0xe/0x20
common_interrupt+0xb6/0xe0
</IRQ>
<TASK>
asm_common_interrupt+0x27/0x40
RIP: 0010:_raw_spin_unlock_irqrestore+0x21/0x60
Code: 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 55 49 89 f0 48 89 e5 c6
07 00 0f 1f 00 41 f7 c0 00 02 00 00 74 06 fb 0f 1f 44 00 00 <65> ff 0d
d0 54 78 66 74 13 5d 31 c0 31 d2 31 c9 31 f6 31 ff 45 31
RSP: 0018:ff493d55f8893cc0 EFLAGS: 00000206
RAX: 0000000000000000 RBX: ff432e7ec4a97610 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000246 RDI: ff432e7ec4a97608
RBP: ff493d55f8893cc0 R08: 0000000000000246 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ff432e7f175eaf40
R13: 0000000000000000 R14: 0000000000000246 R15: ff432e7ec4a97608
eventfd_signal_mask+0x89/0xc0
vhost_add_used_and_signal_n+0x4c/0x60 [vhost]
vhost_tx_batch.isra.0+0xd2/0x250 [vhost_net]
handle_tx_copy+0x1bb/0x3c0 [vhost_net]
handle_tx+0xbc/0xc0 [vhost_net]
handle_tx_kick+0x15/0x20 [vhost_net]
vhost_run_work_list+0x45/0x80 [vhost]
vhost_task_fn+0x50/0xf0
? srso_alias_return_thunk+0x5/0xfbef5
? finish_task_switch.isra.0+0x24f/0x310
? srso_alias_return_thunk+0x5/0xfbef5
? calculate_sigpending+0x33/0x40
? __pfx_vhost_task_fn+0x10/0x10
ret_from_fork+0x44/0x70
? __pfx_vhost_task_fn+0x10/0x10
ret_from_fork_asm+0x1a/0x30
RIP: 0033:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 002b:0000000000000000 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: 0000000000000000 RBX: 00007ffe9f6d3a50 RCX: 000072f92e71a94f
RDX: 0000000000000000 RSI: 000000000000af01 RDI: 000000000000000a
RBP: 0000000000000009 R08: 000000000000000a R09: 000062e66bba9770
R10: 000072f92e7beac0 R11: 0000000000000246 R12: 00007ffe9f6d39f0
R13: 000062e66bba9770 R14: 000062e66be33f00 R15: 0000000000000000
</TASK>
Download attachment "OpenPGP_0xAB80D76A44F7A4AA.asc" of type "application/pgp-keys" (3164 bytes)
Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (841 bytes)
Powered by blists - more mailing lists