lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <201007091750.05020.stephan.diestelhorst@amd.com>
Date:	Fri, 9 Jul 2010 17:50:04 +0200
From:	Stephan Diestelhorst <stephan.diestelhorst@....com>
To:	Tejun Heo <tj@...nel.org>, "Rafael J. Wysocki" <rjw@...k.pl>
CC:	<linux-kernel@...r.kernel.org>, <linux-ide@...r.kernel.org>,
	<linux-pm@...ts.osdl.org>, <stephan.diestelhorst@...il.com>
Subject: HDD not suspending properly / dead on resume

Hi,
  I have n issue with suepnd to RAM and I/O load on a disk. Symptoms
are that the disk does not respond to requests when woken up, producing
only I/O errors on all tested kernels (newest 2.6.35-rc4 (Ubuntu
mainline PPA build)):

[ 1719.580169] sd 0:0:0:0: [sda] Unhandled error code
[ 1719.580174] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1719.580178] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 0f 51 e7 88 00 00 b0 00
[ 1719.580186] end_request: I/O error, dev sda, sector 257025928
[ 1719.580798] Aborting journal on device dm-1-8.
[ 1719.580912] EXT4-fs error (device dm-1) in ext4_reserve_inode_write: Journal has aborted
[ 1719.580959] EXT4-fs (dm-1): Remounting filesystem read-only
[ 1719.581004] sd 0:0:0:0: [sda] Unhandled error code
[ 1719.581007] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1719.581010] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 0f 51 a1 88 00 00 08 00
[ 1719.581016] end_request: I/O error, dev sda, sector 257008008
[ 1719.581026] Buffer I/O error on device dm-1, logical block 2129920
[ 1719.581027] lost page write due to I/O error on dm-1
[ 1719.581149] 
[ 1719.581214] sd 0:0:0:0: [sda] Unhandled error code
[ 1719.581217] sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
[ 1719.581220] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 0e 4d a1 88 00 00 08 00
[ 1719.581227] end_request: I/O error, dev sda, sector 239968648
[ 1719.581254] JBD2: I/O error detected when updating journal superblock for dm-1-8.
[ 1719.581268] journal commit I/O error

This can be triggered most reliably with multiple "direct" writes to
disk, I create the load with the attached script. If the issue is
triggered, suspend (through pm-suspend) takes very long.

IMHO the interesting log output during suspend is:
[ 1668.150125] Suspending console(s) (use no_console_suspend to debug)
[ 1668.150460] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 1668.174958] sd 0:0:0:0: [sda] Stopping disk
[ 1668.198045] ACPI handle has no context!
[ 1668.199302] ohci_hcd 0000:00:14.5: PCI INT C disabled
[ 1668.199468] ohci_hcd 0000:00:13.1: PCI INT A disabled
[ 1668.199477] ohci_hcd 0000:00:13.0: PCI INT A disabled
[ 1668.199520] ehci_hcd 0000:00:12.2: PCI INT B disabled
[ 1668.199525] ohci_hcd 0000:00:12.1: PCI INT A disabled
[ 1668.199562] ohci_hcd 0000:00:12.0: PCI INT A disabled
[ 1668.210138] ehci_hcd 0000:00:13.2: PCI INT B disabled
[ 1668.300295] HDA Intel 0000:00:14.2: PCI INT A disabled
[ 1668.300301] HDA Intel 0000:01:00.1: PCI INT B disabled
[ 1668.300349] ACPI handle has no context!
[ 1669.700139] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1674.700125] ata1.00: qc timeout (cmd 0xec)
[ 1674.700136] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 1674.700139] ata1.00: revalidation failed (errno=-5)
[ 1675.230136] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1685.230125] ata1.00: qc timeout (cmd 0xec)
[ 1685.230137] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 1685.230140] ata1.00: revalidation failed (errno=-5)
[ 1685.230144] ata1: limiting SATA link speed to 1.5 Gbps
[ 1685.760137] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[ 1715.760126] ata1.00: qc timeout (cmd 0xec)
[ 1715.760137] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[ 1715.760139] ata1.00: revalidation failed (errno=-5)
[ 1715.760142] ata1.00: disabled
[ 1715.810216] ahci 0000:00:11.0: PCI INT A disabled
[ 1715.830154] PM: suspend of devices complete after 47679.847 msecs

I've also attached the full dmesg, lspci -vv and smartctl -a
information.

Do you guys have any ideas here?

Many thanks,
  Stephan
-- 
Stephan Diestelhorst, AMD Operating System Research Center
stephan.diestelhorst@....com, Tel. +49 (0)351 448 356 719

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

View attachment "dmesg" of type "text/plain" (62689 bytes)

View attachment "lspci-vv" of type "text/plain" (28041 bytes)

View attachment "smartctl-a" of type "text/plain" (5103 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ