lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Mon, 18 May 2020 14:06:37 +0000
From:   Nick Price <nick@...n.io>
To:     linux-kernel <linux-kernel@...r.kernel.org>
Subject: ixgbe: firmware spam on X520-T2 NIC (82599EB)

In ixgbe_main.c around line 7882, the call to ixgbe_check_fw_error
causes spammy messages on certain adapters because the fwsm register
returns 0, triggering the !(fwsm & IXGBE_FWSM_FW_VAL_BIT) condition.

This causes, every two seconds, one error message to be emitted per
interface:

[79062.730890] ixgbe 0000:2a:00.0: Warning firmware error detected
FWSM: 0x00000000
[79062.890877] ixgbe 0000:2a:00.1: Warning firmware error detected
FWSM: 0x00000000
[79064.746743] ixgbe 0000:2a:00.0: Warning firmware error detected
FWSM: 0x00000000
[79064.906728] ixgbe 0000:2a:00.1: Warning firmware error detected
FWSM: 0x00000000

Bit 15 of this register is supposed to be set to 1 upon card
initialization per the Intel 82599 datasheet, however, these particular
cards do not behave per their documentation and there are no firmware
updates available from Intel or Dell that resolve this issue (there
have been firmware updates for other models which have resolved this
problem)

Would it make sense to skip the error message if the entire fwsm
register is zero?  Or maybe only emit it once?

Or do we just continue to spam because technically this *is* a firmware
error although it does not impact functionality and there is seemingly
no resolution on the vendor side.

Anyone have any thoughts? Some references below.

Thanks!
Nick


For reference:
The commit that added this message is at 
https://github.com/torvalds/linux/commit/59dd45d550c518a2c297b2888f194633cb8e5700

More threads on the subject - it seems people are either patching the
kernel to eliminate the check completely or switching to Intel's
driver:
https://bugs.centos.org/view.php?id=16495
https://patchwork.criu.org/patch/11882/
https://forum.proxmox.com/threads/pve-6-0-7-ixgbe-firmware-errors.58592/




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ