lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Tue, 29 Sep 2015 08:44:49 -0700
From:	Stephen Hemminger <stephen@...workplumber.org>
To:	netdev@...r.kernel.org
Subject: Fw: [Bug 105221] New: system panics under load on mlx4_en
 interfaces



Begin forwarded message:

Date: Tue, 29 Sep 2015 07:19:32 +0000
From: "bugzilla-daemon@...zilla.kernel.org" <bugzilla-daemon@...zilla.kernel.org>
To: "shemminger@...ux-foundation.org" <shemminger@...ux-foundation.org>
Subject: [Bug 105221] New: system panics under load on mlx4_en interfaces


https://bugzilla.kernel.org/show_bug.cgi?id=105221

            Bug ID: 105221
           Summary: system panics under load on mlx4_en interfaces
           Product: Networking
           Version: 2.5
    Kernel Version: 4.3.0-rc3-vanilla
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
          Assignee: shemminger@...ux-foundation.org
          Reporter: thomas@...wermann.org
        Regression: No

We are using HP ProLiant DL320e Gen8 with a dual port ConnectX-2 EN network
Mellanox NIC (P/N: MNPH29D_A2-A5) installed. BIOS, iLO, microcode and NIC
firwmwares are up to date. Already tried to change interrupts. All offloading
features are currently disabled:
Features for eth2:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: on
rx-vlan-stag-filter: on [fixed]
l2-fwd-offload: off [fixed]
busy-poll: on [fixed]

When putting load on those NICs we are receiving a kpanic. The issue can be
reproduced at any time. Kernel version doesn't make any difference.

[  176.892495] ------------[ cut here ]------------
[  176.892513] kernel BUG at net/core/skbuff.c:2097!
[  176.892525] invalid opcode: 0000 [#1] SMP
[  176.892538] Modules linked in: cpufreq_stats cpufreq_userspace
cpufreq_powersave iptable_filter cpufreq_conservative xt_CT nf_conntrack
iptable_raw ip_tables x_tables nfsd auth_rpcgss oid_registry nfs_acl nfs lockd
grace fscache sunrpc ip_gre ip_tunnel gre intel_rapl iosf_mbi
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel sha256_ssse3 sha256_generic hmac drbg
ansi_cprng aesni_intel mgag200 aes_x86_64 lrw ttm drm_kms_helper gf128mul
glue_helper drm ablk_helper iTCO_wdt cryptd iTCO_vendor_support joydev evdev
psmouse ie31200_edac serio_raw hpilo i2c_algo_bit edac_core lpc_ich hpwdt
snd_pcm snd_timer snd 8250_fintek soundcore pcspkr mfd_core ipmi_si
ipmi_msghandler shpchp button pcc_cpufreq acpi_cpufreq processor
acpi_power_meter 8021q
[  176.892778]  garp mrp stp llc dummy autofs4 ext4 crc16 mbcache jbd2 dm_mod
mlx4_en vxlan ip6_udp_tunnel udp_tunnel sg sd_mod uas usb_storage scsi_mod
hid_generic usbhid hid crc32c_intel mlx4_core ehci_pci uhci_hcd tg3 ehci_hcd
ptp pps_core libphy usbcore usb_common thermal
[  176.892868] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.3.0-rc3-vanillaice
#1
[  176.892885] Hardware name: HP ProLiant DL320e Gen8, BIOS J05 11/09/2013
[  176.892902] task: ffffffff81814540 ti: ffffffff81800000 task.ti:
ffffffff81800000
[  176.892919] RIP: 0010:[<ffffffff8144d1a6>]  [<ffffffff8144d1a6>]
__skb_checksum+0x2d6/0x2f0
[  176.892942] RSP: 0018:ffff8802474038f8  EFLAGS: 00010286
[  176.892955] RAX: 00000000ffff12f3 RBX: 00000000ffff12f3 RCX:
00000000ffff0ec6
[  176.892972] RDX: ffff88022ce1d980 RSI: 00000000ffff12f3 RDI:
ffff8800afed4400
[  176.892988] RBP: 0000000000000000 R08: ffff880247403978 R09:
00000000ffff12f3
[  176.893005] R10: ffff88022ce1d300 R11: 0000000000000002 R12:
0000000000000000
[  176.893021] R13: 0000000000000000 R14: 00000000ffff12f3 R15:
0000000000000000
[  176.893038] FS:  0000000000000000(0000) GS:ffff880247400000(0000)
knlGS:0000000000000000
[  176.893056] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  176.893070] CR2: 00007f42a19c0000 CR3: 000000000180d000 CR4:
00000000001406f0
[  176.893086] Stack:
[  176.893092]  00000000b0ddb200 ffff880247403978 ffffffffffff12f3
ffffffff81814540
[  176.893113]  ffffffff81814540 ffffffff81814540 0000000000000000
ffff880000000000
[  176.893134]  0000000000000246 ffff8800afed4400 0000000000000000
ffff88022ce1d300
[  176.893155] Call Trace:
[  176.893162]  <IRQ>
[  176.893169]  [<ffffffff8144d1e2>] ? skb_checksum+0x22/0x30
[  176.893185]  [<ffffffff8144a940>] ? skb_push+0x40/0x40
[  176.893198]  [<ffffffff8144a5e0>] ? reqsk_fastopen_remove+0x150/0x150
[  176.893214]  [<ffffffff81535ed4>] ? udp6_ufo_fragment+0xb4/0x2e0
[  176.893230]  [<ffffffff8149ad74>] ? ip_finish_output2+0x134/0x350
[  176.893245]  [<ffffffff815358f2>] ? ipv6_gso_segment+0x112/0x2a0
[  176.893260]  [<ffffffff8144ac1e>] ? __kmalloc_reserve.isra.31+0x2e/0x80
[  176.893276]  [<ffffffff8145fe5e>] ? skb_mac_gso_segment+0x8e/0xe0
[  176.893292]  [<ffffffff814ded67>] ? gre_gso_segment+0x177/0x450
[  176.893307]  [<ffffffff814cf7d9>] ? inet_gso_segment+0x1d9/0x370
[  176.893322]  [<ffffffff81460600>] ? dev_hard_start_xmit+0x210/0x380
[  176.893337]  [<ffffffff8145fe5e>] ? skb_mac_gso_segment+0x8e/0xe0
[  176.893352]  [<ffffffff81460278>] ?
validate_xmit_skb.isra.98.part.99+0x128/0x2a0
[  176.893370]  [<ffffffff814607a6>] ? validate_xmit_skb_list+0x36/0x50
[  176.893953]  [<ffffffff81481da2>] ? sch_direct_xmit+0x102/0x1e0
[  176.894534]  [<ffffffff81481f0e>] ? __qdisc_run+0x8e/0x1b0
[  176.895115]  [<ffffffff81460b4f>] ? __dev_queue_xmit+0x2bf/0x540
[  176.895691]  [<ffffffff8149ae9a>] ? ip_finish_output2+0x25a/0x350
[  176.896264]  [<ffffffff8149d0c8>] ? ip_output+0x68/0xd0
[  176.896834]  [<ffffffff81490e82>] ? nf_hook_slow+0x62/0xb0
[  176.897389]  [<ffffffff81499131>] ? ip_forward+0x391/0x480
[  176.897927]  [<ffffffff81498d10>] ? ip_frag_mem+0x40/0x40
[  176.898446]  [<ffffffff814978c7>] ? ip_rcv+0x277/0x3a0
[  176.898948]  [<ffffffff81496f90>] ? inet_del_offload+0x40/0x40
[  176.899434]  [<ffffffff8145e883>] ? __netif_receive_skb_core+0x843/0x9a0
[  176.899909]  [<ffffffff814dea33>] ? gre_gro_receive+0x1c3/0x380
[  176.900383]  [<ffffffff81535ac2>] ? tcp6_gro_complete+0x42/0x70
[  176.900825]  [<ffffffff8145ea5f>] ? netif_receive_skb_internal+0x1f/0x80
[  176.901302]  [<ffffffff8145f223>] ? dev_gro_receive+0x213/0x340
[  176.901723]  [<ffffffff8145f527>] ? napi_gro_receive+0x27/0xc0
[  176.902140]  [<ffffffffa051eaf0>] ? gro_cell_poll+0x50/0x90 [ip_tunnel]
[  176.902552]  [<ffffffff8145eefa>] ? net_rx_action+0x20a/0x320
[  176.902957]  [<ffffffff810739d7>] ? __do_softirq+0x107/0x270
[  176.903354]  [<ffffffff81073c76>] ? irq_exit+0x86/0x90
[  176.903744]  [<ffffffff8155198f>] ? do_IRQ+0x4f/0xd0
[  176.904132]  [<ffffffff8154f642>] ? common_interrupt+0x82/0x82
[  176.904516]  <EOI>
[  176.904524]  [<ffffffff81429788>] ? cpuidle_enter_state+0xe8/0x220
[  176.905287]  [<ffffffff81429763>] ? cpuidle_enter_state+0xc3/0x220
[  176.905670]  [<ffffffff810ab064>] ? cpu_startup_entry+0x284/0x340
[  176.906048]  [<ffffffff8192ff37>] ? start_kernel+0x472/0x47a
[  176.906422]  [<ffffffff8192f120>] ? early_idt_handler_array+0x120/0x120
[  176.906793]  [<ffffffff8192f600>] ? x86_64_start_kernel+0x145/0x154
[  176.907157] Code: 14 37 39 c2 7d 92 be 20 08 00 00 48 c7 c7 91 35 78 81 89
44 24 38 e8 da 23 c2 ff 8b 44 24 38 e9 74 ff ff ff 31 ed e9 9a fd ff ff <0f> 0b
89 4c 24 10 e9 50 ff ff ff 66 66 66 66 66 66 2e 0f 1f 84
[  176.907990] RIP  [<ffffffff8144d1a6>] __skb_checksum+0x2d6/0x2f0
[  176.908412]  RSP <ffff8802474038f8>

-- 
You are receiving this mail because:
You are the assignee for the bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ