lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <1156885172.1399.955.camel@farstar>
Date:	Tue, 29 Aug 2006 13:59:32 -0700
From:	Alex Izvorski <aizvorski@...il.com>
To:	linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org
Cc:	nsankar@...adcom.com, jgarzik@...ox.com, support@...ermicro.com
Subject: PROBLEM: HT1000 drops network packets during disk writes

Any ideas?  I would be happy to help test any fixes and/or provide a
test system, if anyone wants to tackle this.

Regards,
-- Alex Izvorski


[1.] One line summary of the problem:
HT1000 drops network packets during disk writes

[2.] Full description of the problem:
A lot of dropped network packets on onboard gigabit controller (tg3) if
_any_ disk writes are done on onboard SATA controller (sata_svw) at the
same time, on a HT1000-based system.  Packet drop 0.2-1% can be caused
by minimal disk writes (a few kb/s) during moderate (100mbit/s or less)
network I/O.  This appears to be caused by disk writes only, and affects
network reads only. 

[3.] Keywords:
HT1000 HT-1000 sata_svw BC5785 sata tg3 Serverworks Broadcom

[4.] Kernel version:
All tested kernel versions (2.6.9 to 2.6.18-rc4) are affected.
Tested: 
2.6.18-rc4
2.6.17
RHEL4 2.6.9-34.ELsmp rpm

[5.] Most recent kernel version which did not have the bug:
all tested versions have this problem

[7.] A small shell script or example program which triggers the
     problem

on test box, run:
dd if=/dev/zero of=testfile bs=1M count=1000 &
iperf -s -u

on another box connected via gigabit ethernet, run:
iperf -c $ip_address_of_test_box -u -b 300m -l 1316

Typical results:
Interval      Transfer   Bandwidth       Jitter   Lost/Total Datagrams
(with disk access)
0.0-10.0 sec  358 MBytes  300 Mbits/sec  0.002 ms  820/285716 (0.29%)
(without disk access)
0.0-10.0 sec  359 MBytes  301 Mbits/sec  0.003 ms    1/285716
(note the 1 "lost" is really zero lost, iperf always reports that)


[8.] Environment
cpu: AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
motherboard: Supermicro H8SSL-i
ram: 512MB ECC
disk: HDS728080PLA380
operating system: CentOS 4.3

[8.1.] Software (add the output of the ver_linux script here)

Linux localhost.localdomain 2.6.9-34.ELsmp #1 SMP Thu Mar 9 06:23:23 GMT
2006 x86_64 x86_64 x86_64 GNU/Linux

Gnu C                  3.4.5
Gnu make               3.80
binutils               2.15.92.0.2
util-linux             2.12a
mount                  2.12a
module-init-tools      3.1-pre5
e2fsprogs              1.35
reiserfsprogs          line
reiser4progs           line
pcmcia-cs              3.2.7
quota-tools            3.12.
PPP                    2.4.2
isdn4k-utils           3.3
nfs-utils              1.0.6
Linux C Library        2.3.4
Dynamic linker (ldd)   2.3.4
Procps                 3.2.3
Net-tools              1.60
Kbd                    1.12
Sh-utils               5.2.1
Modules Loaded         md5 ipv6 sunrpc tg3 ext3 jbd sata_svw libata
sd_mod scsi_mod


[8.2.] Processor information (from /proc/cpuinfo):

AMD Athlon(tm) 64 X2 Dual Core Processor 4600+

[8.3.] Module information (from /proc/modules):

md5 5697 1 - Live 0xffffffffa0181000
ipv6 282785 12 - Live 0xffffffffa0138000
sunrpc 174905 1 - Live 0xffffffffa00e9000
tg3 103621 0 - Live 0xffffffffa0077000
ext3 137809 1 - Live 0xffffffffa0054000
jbd 68977 1 ext3, Live 0xffffffffa0042000
sata_svw 9925 2 - Live 0xffffffffa003e000
libata 66057 1 sata_svw, Live 0xffffffffa002c000
sd_mod 19393 3 - Live 0xffffffffa0026000
scsi_mod 140561 2 libata,sd_mod, Live 0xffffffffa0002000

[8.4.] Loaded driver and hardware information
(/proc/ioports, /proc/iomem)

0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
03c0-03df : vga+
0500-0503 : PM1a_EVT_BLK
0504-0505 : PM1a_CNT_BLK
0508-050b : PM_TMR
0514-051b : GPE0_BLK
0540-0543 : PM1b_EVT_BLK
0544-0545 : PM1b_CNT_BLK
0550-0557 : GPE1_BLK
0cf8-0cff : PCI conf1
5010-5015 : ACPI CPU throttle
a000-bfff : PCI Bus #01
  a800-a81f : 0000:01:0e.0
    a800-a81f : sata_svw
  a880-a883 : 0000:01:0e.0
    a880-a883 : sata_svw
  ac00-ac07 : 0000:01:0e.0
    ac00-ac07 : sata_svw
  b000-b003 : 0000:01:0e.0
    b000-b003 : sata_svw
  b080-b087 : 0000:01:0e.0
    b080-b087 : sata_svw
d400-d4ff : 0000:00:03.0
d800-d8ff : 0000:00:03.1
e000-e0ff : 0000:00:05.0
e800-e8ff : 0000:00:03.2
ffa0-ffaf : 0000:00:02.1
  ffa0-ffa7 : ide0

00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000ca000-000cb7ff : Adapter ROM
000cb800-000ccfff : Adapter ROM
000cd000-000cdfff : Adapter ROM
000f0000-000fffff : System ROM
00100000-1ffeffff : System RAM
  00100000-003090cf : Kernel code
  003090d0-00449123 : Kernel data
1fff0000-1fffdfff : ACPI Tables
1fffe000-1fffffff : ACPI Non-volatile Storage
fc900000-fcafffff : PCI Bus #01
  fc900000-fc9fffff : PCI Bus #02
    fc9c0000-fc9cffff : 0000:02:03.0
      fc9c0000-fc9cffff : tg3
    fc9d0000-fc9dffff : 0000:02:03.1
      fc9d0000-fc9dffff : tg3
  fcafe000-fcafffff : 0000:01:0e.0
    fcafe000-fcafffff : sata_svw
fd000000-fdffffff : 0000:00:05.0
febfb000-febfbfff : 0000:00:03.0
febfc000-febfcfff : 0000:00:03.1
febfd000-febfdfff : 0000:00:03.2
febff000-febfffff : 0000:00:05.0
fec00000-fec02fff : reserved
fee00000-fee00fff : reserved
ff500000-ff5fffff : PCI Bus #01
  ff500000-ff5fffff : PCI Bus #02
ff780000-ffffffff : reserved

[8.5.] PCI information ('lspci -vvv' as root)
(lspci -vvv output attached)

00:01.0 PCI bridge: Broadcom HT1000 PCI/PCI-X bridge
00:02.0 Host bridge: Broadcom HT1000 Legacy South Bridge
00:02.1 IDE interface: Broadcom HT1000 Legacy IDE controller
00:02.2 ISA bridge: Broadcom HT1000 LPC Bridge
00:03.0 USB Controller: Broadcom HT1000 USB Controller (rev 01)
00:03.1 USB Controller: Broadcom HT1000 USB Controller (rev 01)
00:03.2 USB Controller: Broadcom HT1000 USB Controller (rev 01)
00:05.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
01:0d.0 PCI bridge: Broadcom HT1000 PCI/PCI-X bridge (rev b2)
01:0e.0 RAID bus controller: Broadcom BCM5785 (HT1000) SATA Native SATA
Mode
02:03.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
Gigabit Ethernet (rev 10)
02:03.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5704
Gigabit Ethernet (rev 10)

[8.7.] Other information that might be relevant to the problem

What the problem is probably not: 

- it does not require high I/O load: transfering 80-100Mbit/s over the
network while writing only 30-40kb/s to disk causes some dropped
packets.

- it is not caused by disk reading, only by writing (including swap).

- it is not caused by net writing, only by reading (*maybe* - need to
test this more).

- it is not a bug in iperf: exact same results in tcpdump and other
programs which do network I/O.

- it is not due to CPU load: cpu is mostly idle while doing the test;
also loading the cpu 100% with several copies of burnK7 does not cause
any dropped packets by itself, and adjusting the priority of iperf +/-15
has no effect.

- it is not due to anything filesystem-related: writing to a disk device
directly has the same effect.

- it is not affected by the mode in which the SATA controller is in:
BIOS choices are "MMIO" (native SATA) and "IDE" (legacy), results are
identical in either.

- it is not affected by any kernel settings tested: MSI on/off,
preemptible/non-preemptible/preempt big kernel lock, SMP/non-SMP,
flat/sparse/discontig memory, ...

- it is not affected by any ethernet settings tested with ethtool:
checksum offloading on/off, rx/tx queue size, coalescing... these cause
some changes to the number of dropped packets, but there's always a
considerable amount of drop.

====


View attachment "ht1000-dmesg.txt" of type "text/plain" (16451 bytes)

View attachment "ht1000-config-2.6.9-34.ELsmp" of type "text/plain" (42695 bytes)

View attachment "ht1000-lspci-vvv.txt" of type "text/plain" (12304 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ