lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANSNSoXcagawnvGSDBjwbdSPQF9Ee+oBmfPYsDonrsYE+EMpUg@mail.gmail.com>
Date:   Wed, 20 Feb 2019 10:48:59 -0600
From:   Jesse Hathaway <jesse@...ki-mvuki.org>
To:     netdev@...r.kernel.org
Cc:     "David S. Miller" <davem@...emloft.net>
Subject: PROBLEM: dev_hard_start_xmit general protection fault on 4.19.18

After an uptime of 2-4 days our routers are hitting a general protection fault
in dev_hard_start_xmit. We are going to try the latest 4.19.24 release to see
if the bug has been resolved, but I didn't see any obvious commits in the logs.
We are also going to test with a much older 4.9.159 kernel as starting point to
finding when this problem was introduced. Please let me know if there is any
additional information I can provide or any test patches you would like me to
try. Thanks, Jesse Hathaway

Decoded stacktrace:

decode_stacktrace.sh was unable to decode the RIP line, but gdb was able to, if
someone knows why that failed I would love to know.

(gdb) l *dev_hard_start_xmit+0x38
0xffffffff815e7488 is in dev_hard_start_xmit (net/core/dev.c:3256).
3251    {
3252            struct sk_buff *skb = first;
3253            int rc = NETDEV_TX_OK;
3254
3255            while (skb) {
3256                    struct sk_buff *next = skb->next;
3257
3258                    skb->next = NULL;
3259                    rc = xmit_one(skb, dev, txq, next != NULL);
3260                    if (unlikely(!dev_xmit_complete(rc))) {
(gdb)

[423866.182835] general protection fault: 0000 [#1] SMP PTI
[423866.188774] CPU: 4 PID: 0 Comm: swapper/4 Tainted: G            E
   4.19.18-bt7u1-amd64 #1
[423866.198308] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS
2.8.0 005/17/2018
[423866.206874] RIP: 0010:dev_hard_start_xmit (??:?)
[423866.212522] Code: 53 48 83 ec 28 48 85 ff 48 89 54 24 08 48 89 4c
24 18 0f 84 b9 01 00 00 48 8d 86 90 00 00 00 48 89 f5 48 89 fb 48 89
44 24 10 <4c> 8b 33 48 c7 03 00 00 00 00 48 8b 05 77 46 b4 00 4d 85 f6
0f 95
All code
========
   0: 53                    push   %rbx
   1: 48 83 ec 28          sub    $0x28,%rsp
   5: 48 85 ff              test   %rdi,%rdi
   8: 48 89 54 24 08        mov    %rdx,0x8(%rsp)
   d: 48 89 4c 24 18        mov    %rcx,0x18(%rsp)
  12: 0f 84 b9 01 00 00    je     0x1d1
  18: 48 8d 86 90 00 00 00 lea    0x90(%rsi),%rax
  1f: 48 89 f5              mov    %rsi,%rbp
  22: 48 89 fb              mov    %rdi,%rbx
  25: 48 89 44 24 10        mov    %rax,0x10(%rsp)
  2a:* 4c 8b 33              mov    (%rbx),%r14 <-- trapping instruction
  2d: 48 c7 03 00 00 00 00 movq   $0x0,(%rbx)
  34: 48 8b 05 77 46 b4 00 mov    0xb44677(%rip),%rax        # 0xb446b2
  3b: 4d 85 f6              test   %r14,%r14
  3e: 0f                    .byte 0xf
  3f: 95                    xchg   %eax,%ebp

Code starting with the faulting instruction
===========================================
   0: 4c 8b 33              mov    (%rbx),%r14
   3: 48 c7 03 00 00 00 00 movq   $0x0,(%rbx)
   a: 48 8b 05 77 46 b4 00 mov    0xb44677(%rip),%rax        # 0xb44688
  11: 4d 85 f6              test   %r14,%r14
  14: 0f                    .byte 0xf
  15: 95                    xchg   %eax,%ebp
[423866.233612] RSP: 0018:ffff96f4af483b18 EFLAGS: 00010202
[423866.239550] RAX: ffff96f3f72b6600 RBX: 2e5903fe657c2d03 RCX:
0000000000000003
[423866.247627] RDX: ffffcc02bf687600 RSI: 00000000fffffe01 RDI:
ffffffffb69e864d
[423866.255703] RBP: ffff96f4a802a000 R08: 0000000000000001 R09:
00000000000003e8
[423866.263779] R10: 00000000000002f5 R11: ffff96f4a86ff940 R12:
ffff96f4a802a000
[423866.271854] R13: 0000000000000032 R14: 2e5903fe657c2d03 R15:
0000000000000000
[423866.279931] FS:  0000000000000000(0000) GS:ffff96f4af480000(0000)
knlGS:0000000000000000
[423866.289075] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[423866.295596] CR2: 00007fb3c351f000 CR3: 000000074080a001 CR4:
00000000001606e0
[423866.303671] Call Trace:
[423866.306501]  <IRQ>
[423866.308847] __dev_queue_xmit (/source/linux-4.19.18/net/core/dev.c:3830)
[423866.313427] ip_finish_output2
(/source/linux-4.19.18/./include/net/neighbour.h:501
/source/linux-4.19.18/net/ipv4/ip_output.c:229)
[423866.318105] ip_output (/source/linux-4.19.18/net/ipv4/ip_output.c:409)
[423866.321810] ? ip_fragment.constprop.49
(/source/linux-4.19.18/net/ipv4/ip_output.c:293)
[423866.327166] ip_forward (/source/linux-4.19.18/net/ipv4/ip_forward.c:150)
[423866.331161] ? ip_check_defrag
(/source/linux-4.19.18/net/ipv4/ip_forward.c:66)
[423866.335837] ip_rcv (/source/linux-4.19.18/net/ipv4/ip_input.c:527)
[423866.339250] ? ip_rcv_core.isra.15
(/source/linux-4.19.18/net/ipv4/ip_input.c:403)
[423866.344314] __netif_receive_skb_one_core
(/source/linux-4.19.18/net/core/dev.c:4920)
[423866.349866] netif_receive_skb_internal
(/source/linux-4.19.18/net/core/dev.c:5134)
[423866.355222] napi_gro_receive
(/source/linux-4.19.18/net/core/dev.c:5591
/source/linux-4.19.18/net/core/dev.c:5622)
[423866.359615] ixgbe_poll
(/source/linux-4.19.18/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:2404
/source/linux-4.19.18/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:3186)
ixgbe
[423866.364488] ? load_balance (/source/linux-4.19.18/kernel/sched/fair.c:8578)
[423866.368875] net_rx_action
(/source/linux-4.19.18/net/core/dev.c:6262
/source/linux-4.19.18/net/core/dev.c:6328)
[423866.373164] __do_softirq
(/source/linux-4.19.18/kernel/softirq.c:292
/source/linux-4.19.18/./include/linux/jump_label.h:142
/source/linux-4.19.18/./include/trace/events/irq.h:142
/source/linux-4.19.18/kernel/softirq.c:293)
[423866.377260] irq_exit (/source/linux-4.19.18/kernel/softirq.c:372
/source/linux-4.19.18/kernel/softirq.c:412)
[423866.380867] do_IRQ
(/source/linux-4.19.18/./arch/x86/include/asm/irq_regs.h:19
/source/linux-4.19.18/./arch/x86/include/asm/irq_regs.h:26
/source/linux-4.19.18/arch/x86/kernel/irq.c:260)
[423866.384378] common_interrupt
(/source/linux-4.19.18/arch/x86/entry/entry_64.S:646)
[423866.388567]  </IRQ>
[423866.391587] RIP: 0010:mwait_idle (??:?)
[423866.396925] Code: 01 00 0f ae 38 0f ae f0 31 d2 65 48 8b 04 25 40
5c 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 0f 85 30 01 00 00 31 c0 fb
0f 01 c9 <65> 8b 2d a3 13 53 49 0f 1f 44 00 00 eb 07 fb 66 0f 1f 44 00
00 65
All code
========
   0: 01 00                add    %eax,(%rax)
   2: 0f ae 38              clflush (%rax)
   5: 0f ae f0              mfence
   8: 31 d2                xor    %edx,%edx
   a: 65 48 8b 04 25 40 5c mov    %gs:0x15c40,%rax
  11: 01 00
  13: 48 89 d1              mov    %rdx,%rcx
  16: 0f 01 c8              monitor %rax,%rcx,%rdx
  19: 48 8b 00              mov    (%rax),%rax
  1c: a8 08                test   $0x8,%al
  1e: 0f 85 30 01 00 00    jne    0x154
  24: 31 c0                xor    %eax,%eax
  26: fb                    sti
  27: 0f 01 c9              mwait  %rax,%rcx
  2a: 65 8b 2d a3 13 53 49 mov    %gs:*0x495313a3(%rip),%ebp        #
0x495313d4 <-- trapping instruction
  31: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
  36: eb 07                jmp    0x3f
  38: fb                    sti
  39: 66 0f 1f 44 00 00    nopw   0x0(%rax,%rax,1)
  3f: 65                    gs

Code starting with the faulting instruction
===========================================
   0: 65 8b 2d a3 13 53 49 mov    %gs:0x495313a3(%rip),%ebp        # 0x495313aa
   7: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
   c: eb 07                jmp    0x15
   e: fb                    sti
   f: 66 0f 1f 44 00 00    nopw   0x0(%rax,%rax,1)
  15: 65                    gs
[423866.419176] RSP: 0018:ffffac06c01ebe98 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffffdb
[423866.428304] RAX: 0000000000000000 RBX: 0000000000000004 RCX:
0000000000000000
[423866.436944] RDX: 0000000000000000 RSI: ffff96f4af49a760 RDI:
0000000000000004
[423866.445569] RBP: 0000000000000004 R08: 0000000000000000 R09:
0000000000000000
[423866.454198] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000000
[423866.462825] R13: ffff96f4ad1aac40 R14: ffff96f4ad1aac40 R15:
ffff96f4ad1aac40
[423866.471446] do_idle (/source/linux-4.19.18/kernel/sched/idle.c:153
/source/linux-4.19.18/kernel/sched/idle.c:262)
[423866.475681] cpu_startup_entry
(/source/linux-4.19.18/kernel/sched/idle.c:368 (discriminator 1))
[423866.480690] start_secondary
(/source/linux-4.19.18/arch/x86/kernel/smpboot.c:272)
[423866.485693] secondary_startup_64
(/source/linux-4.19.18/arch/x86/kernel/head_64.S:243)
[423866.490976] Modules linked in: drbg(E) ansi_cprng(E) echainiv(E)
esp4(E) xfrm4_mode_transport(E) tcp_diag(E) inet_diag(E)
nf_conntrack_netlink(E) xt_nat(E) xt_policy(E) nfnetlink_log(E)
xt_NFLOG(E) xt_limit(E) ipt_REJECT(E) nf_)
[423866.575127]  serpent_avx2(E) serpent_avx_x86_64(E)
serpent_sse2_x86_64(E) serpent_generic(E) glue_helper(E)
blowfish_generic(E) blowfish_x86_64(E) blowfish_common(E)
cast5_avx_x86_64(E) cast5_generic(E) cast_common(E) crypto_si)
[423866.658693]  crc32c_generic(E) crc32c_intel(E) ext4(E) crc16(E)
mbcache(E) jbd2(E) fscrypto(E) sg(E) sd_mod(E) ehci_pci(E) ahci(E)
ehci_hcd(E) libahci(E) ixgbe(E) libata(E) megaraid_sas(E) dca(E)
usbcore(E) mdio(E) i40e(E) scsi)
[423866.683437] ---[ end trace e0abd70b6f85b1fd ]---

# awk -f scripts/ver_linux

Linux rtr1 4.19.18-bt7u1-amd64 #1 SMP Mon Feb 11 20:09:59 UTC 2019
x86_64 GNU/Linux

GNU C                   4.7
GNU Make                3.81
Binutils                2.22
Util-linux              2.20.1
Mount                   2.20.1
E2fsprogs               1.42.5
Linux C Library         2.13
Dynamic linker (ldd)    2.13
Linux C++ Library       6.0.17
Procps                  3.3.3
Net-tools               1.60
Sh-utils                8.13
Udev                    175
Modules Loaded          8021q acpi_power_meter aesni_intel aes_x86_64
af_key ahci ansi_cprng authenc blowfish_common blowfish_generic
blowfish_x86_64 bonding button camellia_aesni_avx2
camellia_aesni_avx_x86_64 camellia_generic camellia_x86_64
cast5_avx_x86_64 cast5_generic cast_common cbc cmac crc16
crc32c_generic crc32c_intel cryptd crypto_simd ctr dca dcdbas
des_generic drbg drm drm_kms_helper dummy echainiv ehci_hcd ehci_pci
esp4 evdev ext4 fscrypto garp glue_helper gre i2c_algo_bit i2c_dev
i2c_i801 i40e inet_diag ioatdma ip_gre ipmi_devintf ipmi_msghandler
ipmi_si ip_set ip_set_hash_net ip_set_hash_netiface
ip_set_hash_netport iptable_filter iptable_mangle iptable_nat
iptable_raw ip_tables ipt_REJECT ip_tunnel iTCO_vendor_support
iTCO_wdt ixgbe jbd2 libahci libata libcrc32c llc loop lpc_ich mbcache
mdio megaraid_sas mei mei_me mgag200 mrp mxm_wmi nf_conntrack
nf_conntrack_netlink nf_conntrack_proto_gre nf_defrag_ipv4 nf_nat
nf_nat_ipv4 nfnetlink nfnetlink_log nf_reject_ipv4 pcbc pcrypt pcspkr
rmd160 scsi_mod sd_mod serpent_avx2 serpent_avx_x86_64 serpent_generic
serpent_sse2_x86_64 sg sha512_generic sha512_ssse3 snd snd_pcm
snd_timer soundcore stp tcp_diag ttm twofish_avx_x86_64 twofish_common
twofish_generic twofish_x86_64 twofish_x86_64_3way usbcore wmi xcbc
xfrm4_mode_transport xfrm_algo x_tables xt_addrtype xt_connmark
xt_conntrack xt_CT xt_limit xt_mark xt_nat xt_NFLOG xt_policy xt_set
xt_tcpudp

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ