lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Ud6z=EqFFG03VuskAk+SvzVEBBSHu9duRvw09tRV+4e9Q@mail.gmail.com>
Date:   Mon, 20 Nov 2017 08:36:36 -0800
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Sarah Newman <sarah.newman@...puter.org>
Cc:     e1000-devel@...ts.sf.net, Netdev <netdev@...r.kernel.org>
Subject: Re: [E1000-devel] Questions about crashes and GRO

Hi Sarah,

I am adding the netdev mailing list as I am not certain this is an
i350 specific issue. The traces themselves aren't anything I recognize
as an existing issue. From what I can tell it looks like you are
running Xen, so would I be correct in assuming you are bridging
between VMs? If so are you using any sort of tunnels on your network,
if so what type? This information would be useful as we may be looking
at a bug in a tunnel offload for GRO.

On Fri, Nov 17, 2017 at 3:28 PM, Sarah Newman <sarah.newman@...puter.org> wrote:
> Hi,
>
> I have an X10 supermicro with two I350's that has crashed twice now under v4.9.39 within the last 3 weeks, with no crashes before v4.9.39:

What was the last kernel you tested before v4.9.39? Just wondering as
it will help to rule out certain patches as possibly being the issue.

> $ /sbin/lspci | grep -i ethernet
> 02:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
> 02:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
> 04:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
> 04:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
>
> And some X9 supermicro's that have not crashed, with a single I350 I believe:
> $ /sbin/lspci | grep -i ethernet
> 06:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
> 06:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
> 06:00.2 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
> 06:00.3 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
>
> I see in the release notes https://downloadmirror.intel.com/22919/eng/README.txt " Do Not Use LRO When Routing Packets."
>
> We are bridging traffic, not routing, and the crashes are in the GRO code.
>
> Is it possible there are problems with GRO for bridging in the igb driver now? If I disable GRO can I have some confidence it will fix the issue?

As far as LRO not being used when routing, just so you know LRO and
GRO are two very different things. One of the issues with LRO is that
it wasn't reversible in some cases and so could lead to the packet
being changed if they were rerouted. With GRO that shouldn't be the
case as we should be able to get back out the original packets that
were put into a frame. So there shouldn't be any issues using GRO with
bridging or routing.

GRO isn't in the driver. It is in the network stack of the kernel
itself. The only responsibility of igb is to provide the frames in the
correct format so that they can be assembled by GRO if it is enabled.

> Here are my offload settings:
> Features for eth0:
> rx-checksumming: on
> tx-checksumming: on
>         tx-checksum-ipv4: off [fixed]
>         tx-checksum-ip-generic: on
>         tx-checksum-ipv6: off [fixed]
>         tx-checksum-fcoe-crc: off [fixed]
>         tx-checksum-sctp: on
> scatter-gather: on
>         tx-scatter-gather: on
>         tx-scatter-gather-fraglist: off [fixed]
> tcp-segmentation-offload: on
>         tx-tcp-segmentation: on
>         tx-tcp-ecn-segmentation: off [fixed]
>         tx-tcp-mangleid-segmentation: off
>         tx-tcp6-segmentation: on
> udp-fragmentation-offload: off [fixed]
> generic-segmentation-offload: on
> generic-receive-offload: on
> large-receive-offload: off [fixed]
> rx-vlan-offload: on
> tx-vlan-offload: on
> ntuple-filters: off
> receive-hashing: on
> highdma: on [fixed]
> rx-vlan-filter: on [fixed]
> vlan-challenged: off [fixed]
> tx-lockless: off [fixed]
> netns-local: off [fixed]
> tx-gso-robust: off [fixed]
> tx-fcoe-segmentation: off [fixed]
> tx-gre-segmentation: on
> tx-gre-csum-segmentation: on
> tx-ipxip4-segmentation: on
> tx-ipxip6-segmentation: on
> tx-udp_tnl-segmentation: on
> tx-udp_tnl-csum-segmentation: on
> tx-gso-partial: on
> tx-sctp-segmentation: off [fixed]
> fcoe-mtu: off [fixed]
> tx-nocache-copy: off
> loopback: off [fixed]
> rx-fcs: off [fixed]
> rx-all: off
> tx-vlan-stag-hw-insert: off [fixed]
> rx-vlan-stag-hw-parse: off [fixed]
> rx-vlan-stag-filter: off [fixed]
> l2-fwd-offload: off [fixed]
> busy-poll: off [fixed]
> hw-tc-offload: off [fixed]
>
> First crash:
>
> [4083386.299221] ------------[ cut here ]------------
> [4083386.299358] WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:1473 inet_gro_complete+0xbb/0xd0
> [4083386.299520] Modules linked in: sb_edac edac_core 8021q mrp garp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev ip6table_filter
> ip6_tables xen_pciback blktap xen_netback xen_gntdev xen_gnt
> alloc xenfs xen_privcmd xen_evtchn xen_blkback tun sch_htb fuse ext2 ebt_mark ebt_ip ebt_arp ebtable_filter ebtables drbd lru_cache cls_fw
> br_netfilter bridge stp llc iTCO_wdt iTCO_vendor_support pcspkr raid456 async_raid6_recov async_pq
>  async_xor xor async_memcpy async_tx raid10 raid6_pq libcrc32c joydev shpchp i2c_i801 i2c_smbus mei_me mei lpc_ich fjes ipmi_si ipmi_msghandler
> acpi_power_meter ioatdma igb dca raid1 mlx4_en mlx4_ib ib_core ptp pps_core mlx4_core mpt3sas
>  scsi_transport_sas raid_class wmi ast ttm
> [4083386.300888] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.9.39 #1
> [4083386.301002] Hardware name: Supermicro Super Server/X10DRi-LN4+, BIOS 2.0a 09/16/2016
> [4083386.301109]  ffff880306603d90 ffffffff813f5935 0000000000000000 0000000000000000
> [4083386.301221]  ffff880306603dd0 ffffffff810a7e01 000005c18174578a ffff8802f94a9a00
> [4083386.301333]  ffff8802f0824450 0000000000000000 0000000000000040 0000000000000040
> [4083386.301445] Call Trace:
> [4083386.301483]  <IRQ> [4083386.301519]   dump_stack+0x63/0x8e
> [4083386.301596]   __warn+0xd1/0xf0
> [4083386.301665]   warn_slowpath_null+0x1d/0x20
> [4083386.301747]   inet_gro_complete+0xbb/0xd0
> [4083386.301830]   napi_gro_complete+0x73/0xa0
> [4083386.301911]   napi_gro_flush+0x5f/0x80
> [4083386.301988]   napi_complete_done+0x6a/0xb0
> [4083386.302075]   igb_poll+0x38d/0x720 [igb]
> [4083386.302156]   ? igb_msix_ring+0x2e/0x40 [igb]
> [4083386.302255]   ? __handle_irq_event_percpu+0x4b/0x1a0
> [4083386.302349]   net_rx_action+0x158/0x360
> [4083386.302430]   __do_softirq+0xd1/0x283
> [4083386.302507]   irq_exit+0xe9/0x100
> [4083386.302580]   xen_evtchn_do_upcall+0x35/0x50
> [4083386.302665]   xen_do_hypervisor_callback+0x1e/0x40
> [4083386.302754]  <EOI> [4083386.302787]   ? xen_hypercall_sched_op+0xa/0x20
> [4083386.302876]   ? xen_hypercall_sched_op+0xa/0x20
> [4083386.302965]   ? xen_safe_halt+0x10/0x20
> [4083386.303043]   ? default_idle+0x1e/0xd0
> [4083386.303122]   ? arch_cpu_idle+0xf/0x20
> [4083386.303200]   ? default_idle_call+0x2c/0x40
> [4083386.303284]   ? cpu_startup_entry+0x1ac/0x240
> [4083386.303370]   ? rest_init+0x77/0x80
> [4083386.303462]   ? start_kernel+0x4a7/0x4b4
> [4083386.303568]   ? set_init_arg+0x55/0x55
> [4083386.303670]   ? x86_64_start_reservations+0x24/0x26
> [4083386.303776]   ? xen_start_kernel+0x555/0x561
> [4083386.303873] ---[ end trace 8294f59ced689507 ]---
> [4083386.303958] general protection fault: 0000 [#1] SMP
> [4083386.304041] Modules linked in: sb_edac edac_core 8021q mrp garp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev ip6table_filter
> ip6_tables xen_pciback blktap xen_netback xen_gntdev xen_gntalloc xenfs xen_privcmd xe
> n_evtchn xen_blkback tun sch_htb fuse ext2 ebt_mark ebt_ip ebt_arp ebtable_filter ebtables drbd lru_cache cls_fw br_netfilter bridge stp llc iTCO_wdt
> iTCO_vendor_support pcspkr raid456 async_raid6_recov async_pq async_xor xor async_memcp
> y async_tx raid10 raid6_pq libcrc32c joydev shpchp i2c_i801 i2c_smbus mei_me mei lpc_ich fjes ipmi_si ipmi_msghandler acpi_power_meter ioatdma igb dca
> raid1 mlx4_en mlx4_ib ib_core ptp pps_core mlx4_core mpt3sas scsi_transport_sas raid_c
> lass wmi ast ttm
> [4083386.305179] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W       4.9.39 #1
> [4083386.305307] Hardware name: Supermicro Super Server/X10DRi-LN4+, BIOS 2.0a 09/16/2016
> [4083386.305414] task: ffffffff81e0e540 task.stack: ffffffff81e00000
> [4083386.305498] RIP: e030:   skb_release_data+0x73/0xf0
> [4083386.305617] RSP: e02b:ffff880306603d90  EFLAGS: 00010206
> [4083386.305692] RAX: 0000000000000030 RBX: f5b36db76bd162c7 RCX: ffffffff81e60048
> [4083386.305790] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8802f94a9a00
> [4083386.305887] RBP: ffff880306603db0 R08: 0000000000004277 R09: 0000000000000000
> [4083386.305985] R10: 0000000000000005 R11: 0000000000000002 R12: 0000000000000000
> [4083386.306083] R13: ffff8802f94a9a00 R14: ffff88032f527740 R15: 0000000000000040
> [4083386.306186] FS:  0000000000000000(0000) GS:ffff880306600000(0000) knlGS:0000000000000000
> [4083386.306296] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [4083386.306407] CR2: 0000000001692ed8 CR3: 000000022b3c9000 CR4: 0000000000042660
> [4083386.306505] Stack:
> [4083386.306537]  ffff8802f94a9a00 ffff8802f94a9a00 ffffffff8175ac3e 0000000000000040
> [4083386.306649]  ffff880306603dc8 ffffffff81745764 ffff8802f94a9a00 ffff880306603df0
> [4083386.306762]  ffffffff817457c2 ffff8802f94a9a00 ffff8802f0824450 0000000000000000
> [4083386.306874] Call Trace:
> [4083386.306911]  <IRQ> [4083386.306944]   ? napi_gro_complete+0x5e/0xa0
> [4083386.307038]   skb_release_all+0x24/0x30
> [4083386.307133]   kfree_skb+0x32/0x90
> [4083386.307206]   napi_gro_complete+0x5e/0xa0
> [4083386.307287]   napi_gro_flush+0x5f/0x80
> [4083386.307365]   napi_complete_done+0x6a/0xb0
> [4083386.307449]   igb_poll+0x38d/0x720 [igb]
> [4083386.307530]   ? igb_msix_ring+0x2e/0x40 [igb]
> [4083386.307617]   ? __handle_irq_event_percpu+0x4b/0x1a0
> [4083386.307720]   net_rx_action+0x158/0x360
> [4083386.307800]   __do_softirq+0xd1/0x283
> [4083386.307877]   irq_exit+0xe9/0x100
> [4083386.307949]   xen_evtchn_do_upcall+0x35/0x50
> [4083386.308034]   xen_do_hypervisor_callback+0x1e/0x40
> [4083386.308124]  <EOI> [4083386.308156]   ? xen_hypercall_sched_op+0xa/0x20
> [4083386.308246]   ? xen_hypercall_sched_op+0xa/0x20
> [4083386.308334]   ? xen_safe_halt+0x10/0x20
> [4083386.308413]   ? default_idle+0x1e/0xd0
> [4083386.308491]   ? arch_cpu_idle+0xf/0x20
> [4083386.308568]   ? default_idle_call+0x2c/0x40
> [4083386.308651]   ? cpu_startup_entry+0x1ac/0x240
> [4083386.308737]   ? rest_init+0x77/0x80
> [4083386.308811]   ? start_kernel+0x4a7/0x4b4
> [4083386.308890]   ? set_init_arg+0x55/0x55
> [4083386.308968]   ? x86_64_start_reservations+0x24/0x26
> [4083386.309060]   ? xen_start_kernel+0x555/0x561
> [4083386.309144] Code: f0 41 0f c1 46 20 39 c2 74 09 5b 41 5c 41 5d 41 5e 5d c3 45 31 e4 41 80 3e 00 74 39 49 63 c4 48 83 c0 03 48 c1 e0 04 49 8b 1c
> 06 <48> 8b 43 20 a8 01 75 6f f0 ff 4b 1c 74 55 48 8b 03 48 c1 e8 33
> [4083386.309571] RIP   skb_release_data+0x73/0xf0
> [4083386.309658]  RSP <ffff880306603d90>
> [4083386.313000] ---[ end trace 8294f59ced689508 ]---
> [4083386.389667] Kernel panic - not syncing: Fatal exception in interrupt
> [4083386.389791] Kernel Offset: disabled
> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
>
> Second crash:
>
> [1838269.012349] general protection fault: 0000 [#1] SMP
> [1838269.012452] Modules linked in: ebtable_nat sb_edac edac_core 8021q mrp garp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_physdev
> ip6table_filter ip6_tables xen_pciback blktap xen_netback xen_gntdev xen_gntalloc xenfs xe
> n_privcmd xen_evtchn xen_blkback tun sch_htb fuse ext2 ebt_mark ebt_ip ebt_arp ebtable_filter ebtables drbd lru_cache cls_fw br_netfilter bridge stp
> llc iTCO_wdt iTCO_vendor_support pcspkr raid456 async_raid6_recov async_pq async_xor xor
>  async_memcpy async_tx raid10 raid6_pq libcrc32c joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei fjes ipmi_si ipmi_msghandler acpi_power_meter
> ioatdma igb dca raid1 mlx4_en mlx4_ib ib_core ptp pps_core mlx4_core mpt3sas scsi_transpor
> t_sas raid_class wmi ast ttm
> [1838269.013521] CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.9.39 #1
> [1838269.013637] Hardware name: Supermicro Super Server/X10DRi-LN4+, BIOS 2.0a 09/16/2016
> [1838269.013743] task: ffff88030008c4c0 task.stack: ffffc90041978000
> [1838269.013826] RIP: e030:   memcpy_erms+0x6/0x10
> [1838269.013952] RSP: e02b:ffffc9004197bac0  EFLAGS: 00010202
> [1838269.014026] RAX: ffff88032fcafe16 RBX: 0000000000000004 RCX: 0000000000000004
> [1838269.014124] RDX: 0000000000000004 RSI: 62a16ddedc6dbcb3 RDI: ffff88032fcafe16
> [1838269.014222] RBP: ffffc9004197bb20 R08: 0000000000000004 R09: 0000000000000004
> [1838269.014320] R10: ffff88026ae89500 R11: 0000000044639632 R12: 0000000000000048
> [1838269.014417] R13: 0000000000000000 R14: 0000000044639632 R15: 0000000000000048
> [1838269.014519] FS:  0000000000000000(0000) GS:ffff880306640000(0000) knlGS:ffff880306640000
> [1838269.014629] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [1838269.014709] CR2: ffffffffff600400 CR3: 0000000051939000 CR4: 0000000000042660
> [1838269.014808] Stack:
> [1838269.014840]  ffffffff81744c17 ffff88026ae89500 0000000044639632 ffff88030008c4c0
> [1838269.014952]  ffffffff00000004 0000000000000004 ffff88032fcafe16 ffff88026ae89500
> [1838269.015064]  0000000000000004 0000000000000004 000000000000004c 0000000000000028
> [1838269.015176] Call Trace:
> [1838269.015217]   ? skb_copy_bits+0x137/0x2c0
> [1838269.015299]   __pskb_pull_tail+0x7f/0x3b0
> [1838269.015382]   tcp_gro_receive+0x2c5/0x300
> [1838269.015465]   tcp6_gro_receive+0x13a/0x1a0
> [1838269.015547]   ipv6_gro_receive+0x1c6/0x380
> [1838269.015630]   dev_gro_receive+0x269/0x3b0
> [1838269.015712]   napi_gro_receive+0x38/0xf0
> [1838269.015796]   igb_clean_rx_irq+0x38e/0x690 [igb]
> [1838269.015886]   igb_poll+0x362/0x720 [igb]
> [1838269.015968]   ? dequeue_entity+0x26e/0xa90
> [1838269.016051]   ? xen_mc_flush+0x17b/0x1b0
> [1838269.016131]   net_rx_action+0x158/0x360
> [1838269.016212]   __do_softirq+0xd1/0x283
> [1838269.016290]   ? sort_range+0x30/0x30
> [1838269.016366]   run_ksoftirqd+0x29/0x50
> [1838269.016443]   smpboot_thread_fn+0x110/0x160
> [1838269.016525]   kthread+0xd7/0xf0
> [1838269.016595]   ? kthread_park+0x60/0x60
> [1838269.016673]   ret_from_fork+0x25/0x30
> [1838269.016758] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89
> d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
> [1838269.017183] RIP   memcpy_erms+0x6/0x10
> [1838269.017264]  RSP <ffffc9004197bac0>
> [1838269.020618] ---[ end trace 3506ce1d7200529a ]---
> [1838269.079891] Kernel panic - not syncing: Fatal exception in interrupt
> [1838269.080014] Kernel Offset: disabled
> (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.
>
> Thanks, Sarah
>
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@...ts.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ