[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTi=HO3TMYKOzzoHLdiyPgOFRZzjDB0WnYnJjeR65@mail.gmail.com>
Date: Wed, 22 Dec 2010 10:59:24 -0500
From: Greg Freemyer <greg.freemyer@...il.com>
To: Rogier Wolff <R.E.Wolff@...wizard.nl>
Cc: Bruno Prémont <bonbons@...ux-vserver.org>,
linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org
Subject: Re: Slow disks.
On Wed, Dec 22, 2010 at 5:43 AM, Rogier Wolff <R.E.Wolff@...wizard.nl> wrote:
>
> Unquoted text below is from either me or from my friend.
>
>
> Someone suggested we try an older kernel as if kernel 2.6.32 would not
> have this problem. We do NOT think it suddenly started with a certain
> kernel version. I was just hoping to have you kernel-guys help with
> prodding the kernel into revealing which component was screwing things
> up....
>
>
> On Mon, Dec 20, 2010 at 01:32:44PM -0500, Greg Freemyer wrote:
>> On Mon, Dec 20, 2010 at 1:06 PM, Bruno Prémont
>> <bonbons@...ux-vserver.org> wrote:
>> > Hi,
>> >
>> > [ccing linux-ide]
>> >
>> > Please provide the part of kernel log showing initialization of your
>> > disk controller(s) as well as detection of all the discs.
>
>
> sata_sil 0000:03:01.0: version 2.4
> sata_sil 0000:03:01.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
> sata_sil 0000:03:01.0: Applying R_ERR on DMA activate FIS errata fix
> scsi2 : sata_sil
> scsi3 : sata_sil
> scsi4 : sata_sil
> scsi5 : sata_sil
> ata3: SATA max UDMA/100 mmio m1024@...d200000 tf 0xed200080 irq 24
> ata4: SATA max UDMA/100 mmio m1024@...d200000 tf 0xed2000c0 irq 24
> ata5: SATA max UDMA/100 mmio m1024@...d200000 tf 0xed200280 irq 24
> ata6: SATA max UDMA/100 mmio m1024@...d200000 tf 0xed2002c0 irq 24
> ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata3.00: ATA-8: WDC WD10EARS-00Y5B1, 80.00A80, max UDMA/133
> ata3.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata3.00: configured for UDMA/100
> scsi 2:0:0:0: Direct-Access ATA WDC WD10EARS-00Y 80.0 PQ: 0 ANSI: 5
> usb 2-2: new low speed USB device using uhci_hcd and address 2
> ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata4.00: ATA-7: SAMSUNG HD103SI, 1AG01118, max UDMA7
> ata4.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata4.00: configured for UDMA/100
> scsi 3:0:0:0: Direct-Access ATA SAMSUNG HD103SI 1AG0 PQ: 0 ANSI: 5
> ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata5.00: ATA-8: WDC WD10EARS-00Y5B1, 80.00A80, max UDMA/133
> ata5.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata5.00: configured for UDMA/100
> scsi 4:0:0:0: Direct-Access ATA WDC WD10EARS-00Y 80.0 PQ: 0 ANSI: 5
> ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata6.00: ATA-8: WDC WD10EARS-00Y5B1, 80.00A80, max UDMA/133
> ata6.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata6.00: configured for UDMA/100
> scsi 5:0:0:0: Direct-Access ATA WDC WD10EARS-00Y 80.0 PQ: 0 ANSI: 5
> sd 2:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
> sd 2:0:0:0: [sda] Write Protect is off
> sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 3:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
> sd 3:0:0:0: [sdb] Write Protect is off
> sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 4:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
> sd 4:0:0:0: [sdc] Write Protect is off
> sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 5:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
> sd 5:0:0:0: [sdd] Write Protect is off
> sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 5:0:0:0: [sdd] Write Protect is off
> sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sdb: sdb1 sdb2 sdb3 sdb4
> sd 3:0:0:0: [sdb] Attached SCSI disk
> sda: sda1 sda2 sda3 sda4
> sd 2:0:0:0: [sda] Attached SCSI disk
> sdc: sdc1 sdc2 sdc3 sdc4
> sd 4:0:0:0: [sdc] Attached SCSI disk
> sdd: sdd1 sdd2 sdd3 sdd4
> sd 5:0:0:0: [sdd] Attached SCSI disk
>
>
>
>> > Verbose lspci output for the disc controller and $(smartctl -i -A $disk)
>> > output might be useful as well.
>
>
> 03:01.0 Mass storage controller: Silicon Image, Inc. SiI 3114
> [SATALink/SATARaid] Serial ATA Controller (rev 02)
> Subsystem: Silicon Image, Inc. SiI 3114 SATALink Controller
> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- DisINTx-
> Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 32, Cache Line Size: 32 bytes
> Interrupt: pin A routed to IRQ 24
> Region 0: I/O ports at 4020 [size=8]
> Region 1: I/O ports at 4014 [size=4]
> Region 2: I/O ports at 4018 [size=8]
> Region 3: I/O ports at 4010 [size=4]
> Region 4: I/O ports at 4000 [size=16]
> Region 5: Memory at ed200000 (32-bit, non-prefetchable) [size=1K]
> [virtual] Expansion ROM at e8000000 [disabled] [size=512K]
> Capabilities: [60] Power Management version 2
> Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
> PME(D0-,D1-,D2-,D3hot-,D3cold-)
> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME-
> Kernel driver in use: sata_sil
> Kernel modules: sata_sil
>
>
> But also tried onboard card:
>
> 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE
> Controller (rev 01) (prog-if 8a [Master SecP PriP])
> Subsystem: Super Micro Computer Inc Device 7980
> Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
> Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> Latency: 0
> Interrupt: pin A routed to IRQ 18
> Region 0: I/O ports at 01f0 [size=8]
> Region 1: I/O ports at 03f4 [size=1]
> Region 2: I/O ports at 0170 [size=8]
> Region 3: I/O ports at 0374 [size=1]
> Region 4: I/O ports at 30a0 [size=16]
> Kernel driver in use: ata_piix
> Kernel modules: ata_generic, pata_acpi, ata_piix, ide-pci-generic,
> piix
>
> smartctl output:
> Kernel modules: ata_generic, pata_acpi, ata_piix, ide-pci-generic,
> piix
>
> smartctl output:
>
> smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family: Western Digital Caviar Green (Adv. Format) family
> Device Model: WDC WD10EARS-00Y5B1
> Serial Number: WD-WCAV55759454
> Firmware Version: 80.00A80
> User Capacity: 1,000,204,886,016 bytes
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: 8
> ATA Standard is: Exact ATA specification draft version not indicated
> Local Time is: Tue Dec 21 20:06:00 2010 CET
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
> Always - 0
> 3 Spin_Up_Time 0x0027 132 119 021 Pre-fail
> Always - 6391
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age
> Always - 56
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
> Always - 0
> 7 Seek_Error_Rate 0x002e 200 200 000 Old_age
> Always - 0
> 9 Power_On_Hours 0x0032 091 091 000 Old_age
> Always - 7189
> 10 Spin_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 54
> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always
> - 39
> 193 Load_Cycle_Count 0x0032 164 164 000 Old_age Always
> - 109955
> 194 Temperature_Celsius 0x0022 109 107 000 Old_age Always
> - 38
> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always
> - 0
> 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always
> - 0
> 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always
> - 0
> 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
> Offline - 0
> - 0
> 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
> Offline - 0
>
> smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family: Western Digital Caviar Green (Adv. Format) family
> Device Model: WDC WD10EARS-00Y5B1
> Serial Number: WD-WCAV55759454
> Firmware Version: 80.00A80
> User Capacity: 1,000,204,886,016 bytes
> Device is: In smartctl database [for details use: -P show]
> ATA Version is: 8
> ATA Standard is: Exact ATA specification draft version not indicated
> Local Time is: Tue Dec 21 20:06:00 2010 CET
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail
> Always - 0
> 3 Spin_Up_Time 0x0027 132 119 021 Pre-fail
> Always - 6391
> 4 Start_Stop_Count 0x0032 100 100 000 Old_age
> Always - 56
> 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail
> Always - 0
> 7 Seek_Error_Rate 0x002e 200 200 000 Old_age
> Always - 0
> 9 Power_On_Hours 0x0032 091 091 000 Old_age
> Always - 7189
> 10 Spin_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 54
> 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always
> - 39
> 193 Load_Cycle_Count 0x0032 164 164 000 Old_age Always
> - 109955
> 194 Temperature_Celsius 0x0022 109 107 000 Old_age Always
> - 38
> 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always
> - 0
> 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always
> - 0
> 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always
> - 0
> 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
> Offline - 0
>
> smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
> 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age
> Offline - 0
>
>
> The others are very similar....
>
>
>> >
>> > Did you try the individual discs on a completely different system (e.g.
>> > plain desktop system) and what revision of SATA are both components
>> > supporting?
>
> Yes I did. The disks were installed in a MSI/Core2DUO based desktop
> system. No problems at all. Transfer rates up to 200MB/s.
>
>
> The SIL 3114 chip is 1.5Gbps SATA. .
>
>
> Searching for information on the WD drives I stumbled across:
>
> http://community.wdc.com/t5/Other-Internal-Drives/1-TB-WD10EARS-desynch-issues-in-RAID/m-p/11559
>
> Where it seems that WD simply says not to use these drives in a RAID.
> I have experience with "Raid Edition" drives: They go bad at a MUCH
> too high rate. If we can't use the non-raid for a RAID application, then
> there is just ONE possible option: STAY AWAY FROM WESTERN DIGITAL:
>
> Western digital claims it has the right to mess things up if you put a
> non-raid drive in a raid configuration. Well fine. Then they can also
> mess things up in normal situations because when Linux does software
> raid there isn't any difference from RAID accesses.
>
> (if you click through and read their entry in the knowledge base,
> you'd notice that it should be more or less the other way
> around. Linux will drop the RAID-enabled drive from the RAID within
> seven seconds and reporting error on a sector, whereas the desktop
> drive would remain operational until Linux times out (30 seconds?))
>
>
>
> More hardware info:
>
> System: Supermicro PDSMi, 4xDDR2 1GB, disks and controllers as above.
> Current kernel version: 2.6.36.2
> Problem was also present in kernel 2.6.33 (sorry cannot downgrade again.
> This is a production system...)
>
> uname -a:
> Linux jcz.nl 2.6.36-ARCH #1 SMP PREEMPT Fri Dec 10 20:32:37 CET 2010
> x86_64 Intel(R) Pentium(R) D CPU 3.20GHz GenuineIntel GNU/Linux
>
> Disklayout:
>
> major minor #blocks name
>
> 8 0 976762584 sda
> 8 1 240943 sda1
> 8 2 19535040 sda2
> 8 3 1951897 sda3
> 8 4 955032120 sda4
> 8 16 976762584 sdb
> 8 17 240943 sdb1
> 8 18 19535040 sdb2
> 8 19 1951897 sdb3
> 8 20 955032120 sdb4
> 8 32 976762584 sdc
> 8 33 240943 sdc1
> 8 34 19535040 sdc2
> 8 35 1951897 sdc3
> 8 36 955032120 sdc4
> 8 48 976762584 sdd
> 8 49 240943 sdd1
> 8 50 19535040 sdd2
> 8 51 1951897 sdd3
> 8 52 955032120 sdd4
> 9 127 240832 md127
> 9 1 39067648 md1
> 9 126 1910063104 md126
> 9 125 3903488 md125
>
> MDstat:
>
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md125 : active raid5 sdd3[5](S) sdb3[4] sda3[0] sdc3[3]
> 3903488 blocks super 1.1 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
>
> md126 : active raid5 sda4[0] sdd4[3] sdc4[5](S) sdb4[4]
> 1910063104 blocks super 1.1 level 5, 512k chunk, algorithm 2
> [3/3] [UUU]
>
> md1 : active raid5 sda2[0] sdd2[3](S) sdb2[1] sdc2[4]
> 39067648 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3]
> [3/3] [UUU]
>
> md1 : active raid5 sda2[0] sdd2[3](S) sdb2[1] sdc2[4]
> 39067648 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3]
> [UUU]
>
> md127 : active raid1 sdd1[3](S) sda1[0] sdb1[1] sdc1[2]
> 240832 blocks [3/3] [UUU]
>
> unused devices: <none>
> rootfs / rootfs rw 0 0
> proc /proc proc rw,relatime 0 0
> sys /sys sysfs rw,relatime 0 0
> udev /dev devtmpfs
> rw,nosuid,relatime,size=10240k,nr_inodes=506317,mode=755 0 0
> /dev/disk/by-label/rootfs / ext4
> rw,relatime,barrier=1,stripe=256,data=ordered 0 0
> devpts /dev/pts devpts rw,relatime,mode=600,ptmxmode=000 0 0
> shm /dev/shm tmpfs rw,nosuid,nodev,relatime 0 0
> /dev/md127 /boot ext3
> rw,relatime,errors=continue,barrier=0,data=writeback 0 0
> /dev/md126 /data ext4 rw,relatime,barrier=1,data=ordered 0 0
>
>
> Because of the severity of the problems (which remain after trying
> another sata card), I have already bought a new Supermicro server. Let's
> hope that helps.
The load_cycle_counts are very high and that means your drive heads
are parking all the time. Possibly multiple times a minute.
I don't know if its your problem, but I'd say something is wrong and
I've seen excessive head parking cause disk write failures in Windows.
In linux I think it just wears out your drive way prematurely. And
of course and i/o's are delayed if the heads are parked when the
commands hit the drive.
There is a linux package specifically targeting drives that have this
issue. Hopefully it can at least keep your heads from parking
continuously. storage-fixup.
1) Be sure you have the userspace package storage-fixup installed.
2) Look in /etc/storage-fixup.conf and see if your drives are in the list.
If not, try to work with the storage-fixup maintainer (Tejun Heo?) to
get your drives added.
And while testing, watch Load_cycle_count and ensure it is not
increasing too fast. ie. Several times an hour is fine. Several
times per minute is too much.
Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists