lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 22 Dec 2010 10:59:24 -0500
From:	Greg Freemyer <greg.freemyer@...il.com>
To:	Rogier Wolff <R.E.Wolff@...wizard.nl>
Cc:	Bruno Prémont <bonbons@...ux-vserver.org>,
	linux-kernel@...r.kernel.org, linux-ide@...r.kernel.org
Subject: Re: Slow disks.

On Wed, Dec 22, 2010 at 5:43 AM, Rogier Wolff <R.E.Wolff@...wizard.nl> wrote:
>
> Unquoted text below is from either me or from my friend.
>
>
> Someone suggested we try an older kernel as if kernel 2.6.32 would not
> have this problem. We do NOT think it suddenly started with a certain
> kernel version. I was just hoping to have you kernel-guys help with
> prodding the kernel into revealing which component was screwing things
> up....
>
>
> On Mon, Dec 20, 2010 at 01:32:44PM -0500, Greg Freemyer wrote:
>> On Mon, Dec 20, 2010 at 1:06 PM, Bruno Prémont
>> <bonbons@...ux-vserver.org> wrote:
>> > Hi,
>> >
>> > [ccing linux-ide]
>> >
>> > Please provide the part of kernel log showing initialization of your
>> > disk controller(s) as well as detection of all the discs.
>
>
> sata_sil 0000:03:01.0: version 2.4
> sata_sil 0000:03:01.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24
> sata_sil 0000:03:01.0: Applying R_ERR on DMA activate FIS errata fix
> scsi2 : sata_sil
> scsi3 : sata_sil
> scsi4 : sata_sil
> scsi5 : sata_sil
> ata3: SATA max UDMA/100 mmio m1024@...d200000 tf 0xed200080 irq 24
> ata4: SATA max UDMA/100 mmio m1024@...d200000 tf 0xed2000c0 irq 24
> ata5: SATA max UDMA/100 mmio m1024@...d200000 tf 0xed200280 irq 24
> ata6: SATA max UDMA/100 mmio m1024@...d200000 tf 0xed2002c0 irq 24
> ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata3.00: ATA-8: WDC WD10EARS-00Y5B1, 80.00A80, max UDMA/133
> ata3.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata3.00: configured for UDMA/100
> scsi 2:0:0:0: Direct-Access     ATA      WDC WD10EARS-00Y 80.0 PQ: 0 ANSI: 5
> usb 2-2: new low speed USB device using uhci_hcd and address 2
> ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata4.00: ATA-7: SAMSUNG HD103SI, 1AG01118, max UDMA7
> ata4.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata4.00: configured for UDMA/100
> scsi 3:0:0:0: Direct-Access     ATA      SAMSUNG HD103SI  1AG0 PQ: 0 ANSI: 5
> ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata5.00: ATA-8: WDC WD10EARS-00Y5B1, 80.00A80, max UDMA/133
> ata5.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata5.00: configured for UDMA/100
> scsi 4:0:0:0: Direct-Access     ATA      WDC WD10EARS-00Y 80.0 PQ: 0 ANSI: 5
> ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> ata6.00: ATA-8: WDC WD10EARS-00Y5B1, 80.00A80, max UDMA/133
> ata6.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 0/32)
> ata6.00: configured for UDMA/100
> scsi 5:0:0:0: Direct-Access     ATA      WDC WD10EARS-00Y 80.0 PQ: 0 ANSI: 5
> sd 2:0:0:0: [sda] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
> sd 2:0:0:0: [sda] Write Protect is off
> sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
> sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 3:0:0:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
> sd 3:0:0:0: [sdb] Write Protect is off
> sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 4:0:0:0: [sdc] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
> sd 4:0:0:0: [sdc] Write Protect is off
> sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
> sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 5:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
> sd 5:0:0:0: [sdd] Write Protect is off
> sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
> sd 5:0:0:0: [sdd] Write Protect is off
> sd 5:0:0:0: [sdd] Mode Sense: 00 3a 00 00
> sd 5:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
> support DPO or FUA
>  sdb: sdb1 sdb2 sdb3 sdb4
> sd 3:0:0:0: [sdb] Attached SCSI disk
>  sda: sda1 sda2 sda3 sda4
> sd 2:0:0:0: [sda] Attached SCSI disk
>  sdc: sdc1 sdc2 sdc3 sdc4
> sd 4:0:0:0: [sdc] Attached SCSI disk
>  sdd: sdd1 sdd2 sdd3 sdd4
> sd 5:0:0:0: [sdd] Attached SCSI disk
>
>
>
>> > Verbose lspci output for the disc controller and $(smartctl -i -A $disk)
>> > output might be useful as well.
>
>
> 03:01.0 Mass storage controller: Silicon Image, Inc. SiI 3114
> [SATALink/SATARaid] Serial ATA Controller (rev 02)
>        Subsystem: Silicon Image, Inc. SiI 3114 SATALink Controller
>        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR+ FastB2B- DisINTx-
>        Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 32, Cache Line Size: 32 bytes
>        Interrupt: pin A routed to IRQ 24
>        Region 0: I/O ports at 4020 [size=8]
>        Region 1: I/O ports at 4014 [size=4]
>        Region 2: I/O ports at 4018 [size=8]
>        Region 3: I/O ports at 4010 [size=4]
>        Region 4: I/O ports at 4000 [size=16]
>        Region 5: Memory at ed200000 (32-bit, non-prefetchable) [size=1K]
>        [virtual] Expansion ROM at e8000000 [disabled] [size=512K]
>        Capabilities: [60] Power Management version 2
>                Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
>                PME(D0-,D1-,D2-,D3hot-,D3cold-)
>                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME-
>        Kernel driver in use: sata_sil
>        Kernel modules: sata_sil
>
>
> But also tried onboard card:
>
> 00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE
> Controller (rev 01) (prog-if 8a [Master SecP PriP])
>        Subsystem: Super Micro Computer Inc Device 7980
>        Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
> Stepping- SERR- FastB2B- DisINTx-
>        Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>        Latency: 0
>        Interrupt: pin A routed to IRQ 18
>        Region 0: I/O ports at 01f0 [size=8]
>        Region 1: I/O ports at 03f4 [size=1]
>        Region 2: I/O ports at 0170 [size=8]
>        Region 3: I/O ports at 0374 [size=1]
>        Region 4: I/O ports at 30a0 [size=16]
>        Kernel driver in use: ata_piix
>        Kernel modules: ata_generic, pata_acpi, ata_piix, ide-pci-generic,
>        piix
>
> smartctl output:
>        Kernel modules: ata_generic, pata_acpi, ata_piix, ide-pci-generic,
>        piix
>
> smartctl output:
>
> smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Caviar Green (Adv. Format) family
> Device Model:     WDC WD10EARS-00Y5B1
> Serial Number:    WD-WCAV55759454
> Firmware Version: 80.00A80
> User Capacity:    1,000,204,886,016 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   8
> ATA Standard is:  Exact ATA specification draft version not indicated
> Local Time is:    Tue Dec 21 20:06:00 2010 CET
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
> Always       -       0
>  3 Spin_Up_Time            0x0027   132   119   021    Pre-fail
> Always       -       6391
>  4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       56
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always       -       0
>  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
> Always       -       0
>  9 Power_On_Hours          0x0032   091   091   000    Old_age
> Always       -       7189
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age
> Always       -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       54
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always
>      -       39
> 193 Load_Cycle_Count        0x0032   164   164   000    Old_age   Always
>      -       109955
> 194 Temperature_Celsius     0x0022   109   107   000    Old_age   Always
>      -       38
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always
>      -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always
>      -       0
> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always
>      -       0
> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
> Offline      -       0
>      -       0
> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
> Offline      -       0
>
> smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
>
> === START OF INFORMATION SECTION ===
> Model Family:     Western Digital Caviar Green (Adv. Format) family
> Device Model:     WDC WD10EARS-00Y5B1
> Serial Number:    WD-WCAV55759454
> Firmware Version: 80.00A80
> User Capacity:    1,000,204,886,016 bytes
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   8
> ATA Standard is:  Exact ATA specification draft version not indicated
> Local Time is:    Tue Dec 21 20:06:00 2010 CET
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> === START OF READ SMART DATA SECTION ===
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
>  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail
> Always       -       0
>  3 Spin_Up_Time            0x0027   132   119   021    Pre-fail
> Always       -       6391
>  4 Start_Stop_Count        0x0032   100   100   000    Old_age
> Always       -       56
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always       -       0
>  7 Seek_Error_Rate         0x002e   200   200   000    Old_age
> Always       -       0
>  9 Power_On_Hours          0x0032   091   091   000    Old_age
> Always       -       7189
>  10 Spin_Retry_Count        0x0032   100   253   000    Old_age
> Always       -       0
>  11 Calibration_Retry_Count 0x0032   100   253   000    Old_age
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
> Always       -       54
> 192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always
>      -       39
> 193 Load_Cycle_Count        0x0032   164   164   000    Old_age   Always
>      -       109955
> 194 Temperature_Celsius     0x0022   109   107   000    Old_age   Always
>      -       38
> 196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always
>      -       0
> 197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always
>      -       0
> 198 Offline_Uncorrectable   0x0030   200   200   000    Old_age
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always
>      -       0
> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
> Offline      -       0
>
> smartctl 5.40 2010-10-16 r3189 [x86_64-unknown-linux-gnu] (local build)
> 200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age
> Offline      -       0
>
>
> The others are very similar....
>
>
>> >
>> > Did you try the individual discs on a completely different system (e.g.
>> > plain desktop system) and what revision of SATA are both components
>> > supporting?
>
> Yes I did. The disks were installed in a MSI/Core2DUO based desktop
> system. No problems at all. Transfer rates up to 200MB/s.
>
>
> The SIL 3114 chip is 1.5Gbps SATA. .
>
>
> Searching for information on the WD drives I stumbled across:
>
> http://community.wdc.com/t5/Other-Internal-Drives/1-TB-WD10EARS-desynch-issues-in-RAID/m-p/11559
>
> Where it seems that WD simply says not to use these drives in a RAID.
> I have experience with "Raid Edition" drives: They go bad at a MUCH
> too high rate. If we can't use the non-raid for a RAID application, then
> there is just ONE possible option: STAY AWAY FROM WESTERN DIGITAL:
>
> Western digital claims it has the right to mess things up if you put a
> non-raid drive in a raid configuration. Well fine. Then they can also
> mess things up in normal situations because when Linux does software
> raid there isn't any difference from RAID accesses.
>
> (if you click through and read their entry in the knowledge base,
> you'd notice that it should be more or less the other way
> around. Linux will drop the RAID-enabled drive from the RAID within
> seven seconds and reporting error on a sector, whereas the desktop
> drive would remain operational until Linux times out (30 seconds?))
>
>
>
> More hardware info:
>
> System: Supermicro PDSMi, 4xDDR2 1GB, disks and controllers as above.
> Current kernel version: 2.6.36.2
> Problem was also present in kernel 2.6.33 (sorry cannot downgrade again.
> This is a production system...)
>
> uname -a:
> Linux jcz.nl 2.6.36-ARCH #1 SMP PREEMPT Fri Dec 10 20:32:37 CET 2010
> x86_64 Intel(R) Pentium(R) D CPU 3.20GHz GenuineIntel GNU/Linux
>
> Disklayout:
>
> major minor  #blocks  name
>
>   8        0  976762584 sda
>   8        1     240943 sda1
>   8        2   19535040 sda2
>   8        3    1951897 sda3
>   8        4  955032120 sda4
>   8       16  976762584 sdb
>   8       17     240943 sdb1
>   8       18   19535040 sdb2
>   8       19    1951897 sdb3
>   8       20  955032120 sdb4
>   8       32  976762584 sdc
>   8       33     240943 sdc1
>   8       34   19535040 sdc2
>   8       35    1951897 sdc3
>   8       36  955032120 sdc4
>   8       48  976762584 sdd
>   8       49     240943 sdd1
>   8       50   19535040 sdd2
>   8       51    1951897 sdd3
>   8       52  955032120 sdd4
>   9      127     240832 md127
>   9        1   39067648 md1
>   9      126 1910063104 md126
>   9      125    3903488 md125
>
> MDstat:
>
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md125 : active raid5 sdd3[5](S) sdb3[4] sda3[0] sdc3[3]
>      3903488 blocks super 1.1 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
>
> md126 : active raid5 sda4[0] sdd4[3] sdc4[5](S) sdb4[4]
>      1910063104 blocks super 1.1 level 5, 512k chunk, algorithm 2
> [3/3] [UUU]
>
> md1 : active raid5 sda2[0] sdd2[3](S) sdb2[1] sdc2[4]
>      39067648 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3]
> [3/3] [UUU]
>
> md1 : active raid5 sda2[0] sdd2[3](S) sdb2[1] sdc2[4]
>      39067648 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3]
> [UUU]
>
> md127 : active raid1 sdd1[3](S) sda1[0] sdb1[1] sdc1[2]
>      240832 blocks [3/3] [UUU]
>
> unused devices: <none>
> rootfs / rootfs rw 0 0
> proc /proc proc rw,relatime 0 0
> sys /sys sysfs rw,relatime 0 0
> udev /dev devtmpfs
> rw,nosuid,relatime,size=10240k,nr_inodes=506317,mode=755 0 0
> /dev/disk/by-label/rootfs / ext4
> rw,relatime,barrier=1,stripe=256,data=ordered 0 0
> devpts /dev/pts devpts rw,relatime,mode=600,ptmxmode=000 0 0
> shm /dev/shm tmpfs rw,nosuid,nodev,relatime 0 0
> /dev/md127 /boot ext3
> rw,relatime,errors=continue,barrier=0,data=writeback 0 0
> /dev/md126 /data ext4 rw,relatime,barrier=1,data=ordered 0 0
>
>
> Because of the severity of the problems (which remain after trying
> another sata card), I have already bought a new Supermicro server. Let's
> hope that helps.


The load_cycle_counts are very high and that means your drive heads
are parking all the time.  Possibly multiple times a minute.

I don't know if its your problem, but I'd say something is wrong and
I've seen excessive head parking cause disk write failures in Windows.
 In linux I think it just wears out your drive way prematurely.  And
of course and i/o's are delayed if the heads are parked when the
commands hit the drive.

There is a linux package specifically targeting drives that have this
issue.  Hopefully it can at least keep your heads from parking
continuously.  storage-fixup.

1) Be sure you have the userspace package storage-fixup installed.

2) Look in /etc/storage-fixup.conf and see if your drives are in the list.

If not, try to work with the storage-fixup maintainer (Tejun Heo?) to
get your drives added.

And while testing, watch Load_cycle_count and ensure it is not
increasing too fast.  ie. Several times an hour is fine.  Several
times per minute is too much.

Greg
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ