linux-kernel - Re: [PROBLEM] reproduceable storage errors on high IO load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20110701163719.27afc580.taeuber@bbaw.de>
Date:	Fri, 1 Jul 2011 16:37:19 +0200
From:	Lars Täuber <taeuber@...w.de>
To:	linux-kernel@...r.kernel.org
Subject: Re: [PROBLEM] reproduceable storage errors on high IO load

Same with new 2TB Seagate Constellation ES ST2000NM0011 connected to the
areca ARC1300 (mvsas).
It only takes a simple »dd if=/dev/zero of=/dev/sd_« to provoke the problem.

Connected to the onboard AMD_AHCI controller (3.0 Gbps) both disks can be
formatted. Also the dd command line doesn't harm anything.

But there are some messages in dmesg if I do this with the disks still connected to the AHCI:

# mdadm -C /dev/md3 -l5 -n3 /dev/sd[cd] missing
# mke2fs -Fj /dev/md3

in dmesg:

[ 1515.340662] md: bind<sdc>
[ 1515.378861] md: bind<sdd>
[ 1515.470912] md/raid:md3: device sdd operational as raid disk 1
[ 1515.470919] md/raid:md3: device sdc operational as raid disk 0
[ 1515.471728] md/raid:md3: allocated 3230kB
[ 1515.471798] md/raid:md3: raid level 5 active with 2 out of 3 devices,
algorit hm 2
[ 1515.471933] RAID conf printout:
[ 1515.471938]  --- level:5 rd:3 wd:2
[ 1515.471944]  disk 0, o:1, dev:sdc
[ 1515.471949]  disk 1, o:1, dev:sdd
[ 1515.472008] md3: detected capacity change from 0 to 4000797687808
[ 1515.472765]  md3: unknown partition table
[ 1918.040121] ata6.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen [ 1918.040259] ata6.00: failed command: WRITE FPDMA QUEUED
[ 1918.040367] ata6.00: cmd 61/00:00:00:00:b4/04:00:cc:00:00/40 tag 0 ncq 524288 out [ 1918.040371]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 1918.040625] ata6.00: status: { DRDY }
[ 1918.040718] ata6.00: failed command: WRITE FPDMA QUEUED
[ 1918.040822] ata6.00: cmd 61/00:08:00:04:b4/04:00:cc:00:00/40 tag 1 ncq 524288 out [ 1918.040825]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 1918.041078] ata6.00: status: { DRDY }
[ 1918.041173] ata6: hard resetting link
[ 1918.041202] ata5.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen [ 1918.041315] ata5.00: failed command: WRITE FPDMA QUEUED
[ 1918.041422] ata5.00: cmd 61/00:00:00:00:b4/04:00:cc:00:00/40 tag 0 ncq 524288 out [ 1918.041426]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 1918.041681] ata5.00: status: { DRDY }
[ 1918.041772] ata5.00: failed command: WRITE FPDMA QUEUED
[ 1918.041877] ata5.00: cmd 61/00:08:00:04:b4/04:00:cc:00:00/40 tag 1 ncq 524288 out [ 1918.041880]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [ 1918.042133] ata5.00: status: { DRDY }
[ 1918.042227] ata5: hard resetting link
[ 1918.590112] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1918.590155] ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1918.592281] ata5.00: configured for UDMA/133
[ 1918.592297] ata5.00: device reported invalid CHS sector 0
[ 1918.592307] ata5.00: device reported invalid CHS sector 0
[ 1918.592322] ata5: EH complete
[ 1918.592804] ata6.00: configured for UDMA/133
[ 1918.592818] ata6.00: device reported invalid CHS sector 0
[ 1918.592827] ata6.00: device reported invalid CHS sector 0
[ 1918.592841] ata6: EH complete

But the format successfully completes.

Is there an important difference if the controller are onboard or connected via PCIe slot?

I'll try some more SATA controllers on monday.
In the meanwhile I'll check the ram with memtest86+ as suggested from Lee Mathers.

Have a nice weekend.
Lars
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/