lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <46BB190E.2030102@oracle.com>
Date:	Thu, 09 Aug 2007 09:39:26 -0400
From:	Chuck Lever <chuck.lever@...cle.com>
To:	linux-kernel@...r.kernel.org
Subject: DMA CRC errors with SiS chipset and notebook drive

For various reasons I want a silent PC, so I'm using a notebook hard
drive in my desktop system.  I've recycled an ancient Foxconn Socket 754
mainboard in it with this IDE controller:

00:02.5 IDE interface: Silicon Integrated Systems [SiS] 5513 [IDE] (rev
01) (prog-if 80 [Master])
         Subsystem: Foxconn International, Inc. Unknown device 0c92
         Flags: bus master, medium devsel, latency 128
         I/O ports at 01f0 [size=8]
         I/O ports at 03f4 [size=1]
         I/O ports at 0170 [size=8]
         I/O ports at 0374 [size=1]
         I/O ports at 4000 [size=16]

On boot, the kernel reports:

SCSI subsystem initialized
libata version 2.21 loaded.
pata_sis 0000:00:02.5: version 0.5.1
scsi0 : pata_sis
scsi1 : pata_sis
ata1: PATA max UDMA/133 cmd 0x00000000000101f0 ctl 0x00000000000103f6 
bmdma 0x00
00000000014000 irq 14
ata2: PATA max UDMA/133 cmd 0x0000000000010170 ctl 0x0000000000010376 
bmdma 0x00
00000000014008 irq 15
ata1.00: ATA-6: ST94813A, 3.04, max UDMA/100
ata1.00: 78140160 sectors, multi 16: LBA48
ata1.01: ATAPI: TSSTcorpCD/DVDW SH-S182D, SB00, max UDMA/33
ata1.00: configured for UDMA/100
ata1.01: configured for UDMA/33
scsi 0:0:0:0: Direct-Access     ATA      ST94813A         3.04 PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 78140160 512-byte hardware sectors (40008 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO
  or FUA
sd 0:0:0:0: [sda] 78140160 512-byte hardware sectors (40008 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO
  or FUA
  sda: sda1 sda2 sda3
sd 0:0:0:0: [sda] Attached SCSI disk
scsi 0:0:1:0: CD-ROM            TSSTcorp CD/DVDW SH-S182D SB00 PQ: 0 ANSI: 5

Then later in the log, I see:

Aug  9 07:41:29 monet kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr
0x0 action 0x2
Aug  9 07:41:29 monet kernel: ata1.00: (BMDMA stat 0x64)
Aug  9 07:41:29 monet kernel: ata1.00: cmd
c8/00:10:73:15:00/00:00:00:00:00/e1 tag 0 cdb 0x12 data 8192 in
Aug  9 07:41:29 monet kernel:          res
51/84:00:00:00:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
Aug  9 07:41:29 monet kernel: ata1: soft resetting port
Aug  9 07:41:29 monet kernel: ata1.00: configured for UDMA/100
Aug  9 07:41:29 monet kernel: ata1.01: configured for UDMA/33
Aug  9 07:41:29 monet kernel: ata1: EH complete

And eventually:

Aug  9 07:41:29 monet kernel: ata1.00: limiting speed to UDMA/66:PIO4
Aug  9 07:41:29 monet kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr
0x0 action 0x2
Aug  9 07:41:29 monet kernel: ata1.00: (BMDMA stat 0x64)
Aug  9 07:41:29 monet kernel: ata1.00: cmd
c8/00:08:fb:7c:a1/00:00:00:00:00/e0 tag 0 cdb 0x12 data 4096 in
Aug  9 07:41:29 monet kernel:          res
51/84:00:00:00:00/00:00:00:00:00/e0 Emask 0x10 (ATA bus error)
Aug  9 07:41:29 monet kernel: ata1: soft resetting port
Aug  9 07:41:29 monet kernel: ata1.00: configured for UDMA/66
Aug  9 07:41:29 monet kernel: ata1.01: configured for UDMA/33
Aug  9 07:41:29 monet kernel: ata1: EH complete

After that, no more DMA error messages (until the next system reboot).

I've tried this with a brand-new Seagate Momentus 5400.2 and a fairly 
new Western Digital WD800VE.  According to the disk specifications 
available on the vendor web sites, both drives are capable of UDMA/100 
speeds.

This error occurs whether the drive is on the primary IDE controller by
itself, or whether it's sharing the controller with an optical drive
(ata1.01 in the above listing).  I've changed the disk to the secondary
controller, I've swapped IDE cables, and changed out the 44-40
pin notebook drive adapter; no change.

The system is running 2.6.22.1-41.fc7 (x86-64, of course), but I've seen 
this on 2.6.21-based kernels as well.  I've noticed similar errors with 
a Dell Latitude D600 (x86, not SiS), also running recent Fedora 7 kernels.

Searching the web seems to indicate that either the drive or the
controller is bad, but since the error appears in so many different
hardware configurations, I'm beginning to think this is a problem with
libata, as that is an obvious common element.

This e-mail is mostly an experience report, but I'm also wondering if I
should be concerned about data on the drive, or have I missed the real
problem entirely?

Addendum: The internal SMART error log on the Seagate shows these errors 
repeatedly:

Error 103 occurred at disk power-on lifetime: 296 hours (12 days + 8 hours)
   When the command that caused the error occurred, the device was
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 08 fb 7c a1 e0 00      00:08:31.088  READ DMA
   c8 00 08 ab 7c a1 e0 00      00:08:31.080  READ DMA
   c8 00 08 f3 7c a1 e0 00      00:08:31.079  READ DMA
   c8 00 08 eb 7c a1 e0 00      00:08:31.077  READ DMA
   c8 00 08 5b f6 fc e0 00      00:08:31.077  READ DMA

Error 102 occurred at disk power-on lifetime: 296 hours (12 days + 8 hours)
   When the command that caused the error occurred, the device was
active or idle.

   After command completion occurred, registers were:
   ER ST SC SN CL CH DH
   -- -- -- -- -- -- --
   84 51 00 00 00 00 e0  Error: ICRC, ABRT at LBA = 0x00000000 = 0

   Commands leading to the command that caused the error were:
   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
   -- -- -- -- -- -- -- --  ----------------  --------------------
   c8 00 10 73 15 00 e1 00      00:08:28.555  READ DMA
   27 00 00 00 00 00 e0 00      00:08:28.241  READ NATIVE MAX ADDRESS EXT
   ec 00 00 00 00 00 a0 02      00:08:27.695  IDENTIFY DEVICE
   ef 03 45 00 00 00 a0 02      00:08:27.536  SET FEATURES [Set transfer
mode]
   27 00 00 00 00 00 e0 00      00:08:27.532  READ NATIVE MAX ADDRESS EXT

View attachment "chuck.lever.vcf" of type "text/x-vcard" (291 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ