lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20081208205250.GH29279@cacr.caltech.edu>
Date:	Mon, 8 Dec 2008 12:52:51 -0800
From:	Jan Lindheim <lindheim@...r.caltech.edu>
To:	linux-kernel@...r.kernel.org
Subject: stability problems with ixgb

We have a few large linux file servers, where we have tried to use
the Intel 10 GigE (PCI-X version) without much success.  We can easily stress
the network in a way that makes the adapter die in less than an hour.
Here's the kernel trace from when the adapter dies, using kernel 2.6.27.8:

Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950053] WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0x1f8/0x210()
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950055] NETDEV WATCHDOG: eth2 (ixgb): transmit timed out
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950057] Modules linked in: nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ppdev video output pci_slot ac sbs sbshc battery ipv6 xfs sbp2 lp cfi_cmdset_0002 cfi_util jedec_probe cfi_probe i2c_nforce2 gen_probe i2c_core psmouse ixgb parport_pc pcspkr k8temp parport ck804xrom shpchp pci_hotplug mtd chipreg map_funcs container button evdev ext3 jbd mbcache sg ide_cd_mod sr_mod cdrom sd_mod sata_nv pata_amd pata_acpi floppy arcmsr ata_generic tg3 libphy libata amd74xx scsi_mod ide_core ohci1394 dock ieee1394 ehci_hcd ohci_hcd usbcore dm_mirror dm_log dm_snapshot dm_mod thermal processor fan fuse
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950106] Pid: 0, comm: swapper Not tainted 2.6.27.8 #2
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950108] 
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950108] Call Trace:
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950111]  <IRQ>  [<ffffffff8023beec>] warn_slowpath+0xbc/0xf0
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950120]  [<ffffffff8022d397>] ? source_load+0x37/0x70
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950124]  [<ffffffff80367a69>] ? __next_cpu+0x19/0x30
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950128]  [<ffffffff80230e1a>] ? find_busiest_group+0x19a/0x850
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950131]  [<ffffffff8036d4df>] ? strlcpy+0x4f/0x70
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950135]  [<ffffffff80427670>] ? dev_watchdog+0x0/0x210
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950138]  [<ffffffff80427868>] dev_watchdog+0x1f8/0x210
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950142]  [<ffffffff802467a8>] run_timer_softirq+0x1a8/0x220
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950146]  [<ffffffff8021db39>] ? lapic_next_event+0x9/0x20
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950149]  [<ffffffff80241ce4>] __do_softirq+0x84/0x110
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950153]  [<ffffffff8020d76c>] call_softirq+0x1c/0x30
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950156]  [<ffffffff8020f2ba>] do_softirq+0x6a/0xb0
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950158]  [<ffffffff80241c48>] irq_exit+0x98/0xb0
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950161]  [<ffffffff8021dfa7>] smp_apic_timer_interrupt+0x87/0xc0
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950164]  [<ffffffff8020d1bb>] apic_timer_interrupt+0x6b/0x70
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950166]  <EOI>  [<ffffffff80213a45>] ? default_idle+0x35/0x50
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950173]  [<ffffffff80213a43>] ? default_idle+0x33/0x50
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950175]  [<ffffffff8020aeaf>] ? cpu_idle+0x5f/0xe0
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950180]  [<ffffffff8049a885>] ? start_secondary+0x145/0x1a0
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950182] 
Dec  8 11:48:54 shc-storage02 kernel: [ 3106.950183] ---[ end trace 6be5d7707d0de713 ]---

Once the adapter stops to respond, the only way to bring it back,
is to reboot.  Unloading and restarting the kernel module and/or network
does not help.

One of the most reliable ways to bring it down, is to run a bonnie++ test from
6 clients to the server, over NFS.  Each client has gigabit network.

This is not a new problem with the latest kernel.  We have also tested with
2.6.25.6, which fails the same way.

Any insight and suggestions to the problem is much appreciated.

Regards,
Jan Lindheim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ