lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240419124632.60294-1-oxana@cloudflare.com>
Date: Fri, 19 Apr 2024 13:45:47 +0100
From: Oxana Kharitonova <oxana@...udflare.com>
To: netdev@...r.kernel.org
Cc: saeedm@...dia.com,
	leon@...nel.org,
	davem@...emloft.net,
	edumazet@...gle.com,
	kuba@...nel.org,
	pabeni@...hat.com,
	rrameshbabu@...dia.com,
	oxana@...udflare.com,
	kernel-team@...udflare.com
Subject: mlx5 driver fails to detect NIC in 6.6.28

Hello,

NIC stopped being detected in Linux 6.6.28. The problem was observed on 
two servers, after reverting kernel to 6.6.25 (our current stable version) 
everything returned to normal.

We suspect commit "net/mlx5e: Do not produce metadata freelist entries in 
Tx port ts WQE xmit", but we haven't done bisect yet. 

The kernel log is below.

root@...alhost:~# ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
       inet 127.0.0.1  netmask 255.0.0.0
       inet6 ::1  prefixlen 128  scopeid 0x10<host>
       loop  txqueuelen 1000  (Local Loopback)
       RX packets 80  bytes 6480 (6.3 KiB)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 80  bytes 6480 (6.3 KiB)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

root@...alhost:~# lspci | grep Eth
c1:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]
c1:00.1 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx]

[   23.519113] RIP: 0010:esw_port_metadata_get (drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:4095 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:2442) mlx5_core
[   23.524293] usb 2-1: new SuperSpeed USB device number 2 using xhci_hcd
[ 23.528602] Code: eb 8e 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 53 48 89 d3 e8 f2 5d ea c9 48 8b 80 b0 09 00 00 <8b> 80 18 11 00 00 88 03 31 c0 80 23 01 5b e9 38 1f f3 c9 0f 1f 84
All code
========
   0:	eb 8e                	jmp    0xffffffffffffff90
   2:	0f 1f 00             	nopl   (%rax)
   5:	90                   	nop
   6:	90                   	nop
   7:	90                   	nop
   8:	90                   	nop
   9:	90                   	nop
   a:	90                   	nop
   b:	90                   	nop
   c:	90                   	nop
   d:	90                   	nop
   e:	90                   	nop
   f:	90                   	nop
  10:	90                   	nop
  11:	90                   	nop
  12:	90                   	nop
  13:	90                   	nop
  14:	90                   	nop
  15:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  1a:	53                   	push   %rbx
  1b:	48 89 d3             	mov    %rdx,%rbx
  1e:	e8 f2 5d ea c9       	call   0xffffffffc9ea5e15
  23:	48 8b 80 b0 09 00 00 	mov    0x9b0(%rax),%rax
  2a:*	8b 80 18 11 00 00    	mov    0x1118(%rax),%eax		<-- trapping instruction
  30:	88 03                	mov    %al,(%rbx)
  32:	31 c0                	xor    %eax,%eax
  34:	80 23 01             	andb   $0x1,(%rbx)
  37:	5b                   	pop    %rbx
  38:	e9 38 1f f3 c9       	jmp    0xffffffffc9f31f75
  3d:	0f                   	.byte 0xf
  3e:	1f                   	(bad)
  3f:	84                   	.byte 0x84

Code starting with the faulting instruction
===========================================
   0:	8b 80 18 11 00 00    	mov    0x1118(%rax),%eax
   6:	88 03                	mov    %al,(%rbx)
   8:	31 c0                	xor    %eax,%eax
   a:	80 23 01             	andb   $0x1,(%rbx)
   d:	5b                   	pop    %rbx
   e:	e9 38 1f f3 c9       	jmp    0xffffffffc9f31f4b
  13:	0f                   	.byte 0xf
  14:	1f                   	(bad)
  15:	84                   	.byte 0x84
[   23.528604] RSP: 0018:ffffc9000dbbfba8 EFLAGS: 00010282
[   23.530802] hub 1-1:1.0: 4 ports detected
[   23.537574] RAX: 0000000000000000 RBX: ffffc9000dbbfbfc RCX: 0000000000000028
[   23.537576] RDX: ffffc9000dbbfbfc RSI: 0000000000000013 RDI: ffff88811ec38000
[   23.537578] RBP: ffffffffc23fa560 R08: 0000000000000000 R09: 0000000000000000
[   23.537580] R10: 0000000000036ea0 R11: 0000000000000dc0 R12: ffff889850104f00
[   23.547568] usb 2-1: New USB device found, idVendor=05e3, idProduct=0620, bcdDevice=93.03
[   23.564222] R13: ffff8881075ca840 R14: ffff88811ec38000 R15: 0000000000000000
[   23.564224] FS:  0000000000000000(0000) GS:ffff88843fa00000(0000) knlGS:0000000000000000
[   23.564226] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.570143] usb 2-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[   23.574839] CR2: 0000000000001118 CR3: 0000000c7af5c000 CR4: 0000000000350ef0
[   23.574841] Call Trace:
[   23.574844]  <TASK>
[   23.574846] ? __die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434) 
[   23.590485] ? page_fault_oops (arch/x86/mm/fault.c:707) 
[   23.590490] ? get_page_from_freelist (mm/page_alloc.c:1553 mm/page_alloc.c:3177) 
[   23.606129] ? exc_page_fault (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:72 arch/x86/mm/fault.c:1504 arch/x86/mm/fault.c:1552) 
[   23.655856] ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570) 
[   23.671501] ? esw_port_metadata_get (drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:4095 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:2442) mlx5_core
[   23.790812] ? esw_port_metadata_get (drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:4095 drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:2442) mlx5_core
[   23.955180] devlink_nl_param_fill.constprop.0 (net/devlink/param.c:268) 
[   23.961276] ? __alloc_skb (net/core/skbuff.c:651 (discriminator 1)) 
[   23.974490] ? srso_return_thunk (arch/x86/lib/retpoline.S:217) 
[   23.987013] ? __kmalloc_node_track_caller (mm/slab_common.c:1025 mm/slab_common.c:1046) 
[   23.987018] ? srso_return_thunk (arch/x86/lib/retpoline.S:217) 
[   23.997812] ? kmalloc_reserve (net/core/skbuff.c:584) 
[   23.997816] ? srso_return_thunk (arch/x86/lib/retpoline.S:217) 
[   24.007217] ? __alloc_skb (net/core/skbuff.c:666) 
[   24.007222] devlink_param_notify.constprop.0 (net/devlink/param.c:354 net/devlink/param.c:330) 
[   24.055512] devl_params_register (net/devlink/param.c:686 (discriminator 1)) 
[   24.055516] esw_offloads_init (drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c:2482) mlx5_core
[   24.065009] mlx5_eswitch_init (drivers/net/ethernet/mellanox/mlx5/core/eswitch.c:1872) mlx5_core
[   24.165229] mlx5_init_one_devl_locked (drivers/net/ethernet/mellanox/mlx5/core/main.c:1022 drivers/net/ethernet/mellanox/mlx5/core/main.c:1447) mlx5_core
[   24.177007] probe_one (drivers/net/ethernet/mellanox/mlx5/core/main.c:1507 drivers/net/ethernet/mellanox/mlx5/core/main.c:1947) mlx5_core
[   24.187296] local_pci_probe (drivers/pci/pci-driver.c:325) 
[   24.196698] work_for_cpu_fn (kernel/workqueue.c:5618 (discriminator 1)) 
[   24.205988] process_one_work (kernel/workqueue.c:2632) 
[   24.215612] worker_thread (kernel/workqueue.c:2694 (discriminator 2) kernel/workqueue.c:2781 (discriminator 2)) 
[   24.224999] ? __pfx_worker_thread (kernel/workqueue.c:2727) 
[   24.234912] kthread (kernel/kthread.c:388) 
[   24.243647] ? __pfx_kthread (kernel/kthread.c:341) 
[   24.252988] ret_from_fork (arch/x86/kernel/process.c:153) 
[   24.262103] ? __pfx_kthread (kernel/kthread.c:341) 
[   24.271347] ret_from_fork_asm (arch/x86/entry/entry_64.S:314) 
[   24.280754]  </TASK>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ