linux-kernel - Re: exception Emask 0x0 SAct 0x1 / SErr 0x0 action 0x2 frozen

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.1.10.0809301715280.14600@p34.internal.lan>
Date:	Tue, 30 Sep 2008 17:18:43 -0400 (EDT)
From:	Justin Piszcz <jpiszcz@...idpixels.com>
To:	Tom Mortensen <tmmlkml@...il.com>
cc:	Tejun Heo <tj@...nel.org>, Bill Davidsen <davidsen@....com>,
	Gwendal Grignou <gwendal@...gle.com>,
	Brian Rademacher <rad@...files.net>, linux-ide@...r.kernel.org,
	linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org,
	Bruce Allen <ballen@...vity.phys.uwm.edu>
Subject: Re: exception Emask 0x0 SAct 0x1 / SErr 0x0 action 0x2 frozen



On Tue, 30 Sep 2008, Tom Mortensen wrote:

> Don't know if this is the original poster's problem, but if the drive
> is spun down, then enabling SMART or trying to read SMART attributes
> causes the drive to spin up and the command is delayed until this has
> occurred.
>
> The fix is to increase the timeout given to scsi_execute() in
> drivers/ata/libata-scsi.c.
>
> ie, current code (2.6.26.5) is:
>
>        /* Good values for timeout and retries?  Values below
>           from scsi_ioctl_send_command() for default case... */
>        cmd_result = scsi_execute(scsidev, scsi_cmd, data_dir, argbuf, argsize,
>                                  sensebuf, (10*HZ), 5, 0);
>
> Should be changed to:
>
>        /* Good values for timeout and retries?  Values below
>           from scsi_ioctl_send_command() for default case... */
>        cmd_result = scsi_execute(scsidev, scsi_cmd, data_dir, argbuf, argsize,
>                                  sensebuf, (30*HZ), 5, 0);
>
> Using a 1TB Hitachi hard drive, this command times out because it
> takes this drive about 15 seconds to spin up.  Virtutally all hard
> drives spin up in less than 30 sec, but perhaps make this higher in
> case there are slower drives out there?
>
> Cheers,
> Tom

Velociraptor 10k drive here (2.6.26.5):

Sep 30 15:55:06 p34 kernel: [420781.333179] ata6.00: exception Emask 0x0 SAct
0x0 SErr 0x0 action 0x6 frozen
Sep 30 15:55:06 p34 kernel: [420781.333189] ata6.00: cmd
b0/d8:00:00:4f:c2/00:00:00:00:00/00 tag 0
Sep 30 15:55:06 p34 kernel: [420781.333190]          res
40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 30 15:55:06 p34 kernel: [420781.333194] ata6.00: status: { DRDY }
Sep 30 15:55:06 p34 kernel: [420781.333200] ata6: hard resetting link
Sep 30 15:55:06 p34 kernel: [420781.638589] ata6: SATA link up 3.0 Gbps (SStatus
123 SControl 300)
Sep 30 15:55:06 p34 kernel: [420781.662166] ata6.00: configured for UDMA/133
Sep 30 15:55:06 p34 kernel: [420781.669416] sd 5:0:0:0: [sdf] Write Protect is
off
Sep 30 15:55:06 p34 kernel: [420781.669416] sd 5:0:0:0: [sdf] Mode Sense: 00 3a
00 00
Sep 30 15:55:06 p34 kernel: [420781.669416] sd 5:0:0:0: [sdf] Write cache:
enabled, read cache: enabled, doesn't support DPO or FUA

Nothing wrong with the disk, it just happens... :(  Linux/kernel bug?
It happens on multiple controllers, Intel, SiI, Marvell, does not seem to
matter.

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA
_of_first_error
# 1  Short offline       Completed without error       00%      2761         -
# 2  Short offline       Completed without error       00%      2737         -
# 3  Extended offline    Completed without error       00%      2714         -
# 4  Short offline       Completed without error       00%      2689         -
# 5  Extended offline    Completed without error       00%      2514         -
# 6  Short offline       Completed without error       00%      2306         -
# 7  Short offline       Completed without error       00%      2282         -
# 8  Short offline       Completed without error       00%      2258         -
# 9  Short offline       Completed without error       00%      2234         -
#10  Extended offline    Completed without error       00%      2211         -
#11  Short offline       Completed without error       00%      2186         -
#12  Short offline       Completed without error       00%      2138         -
#13  Short offline       Completed without error       00%      2114         -
#14  Short offline       Completed without error       00%      2090         -
#15  Short offline       Completed without error       00%      2066         -
#16  Extended offline    Completed without error       00%      2043         -
#17  Short offline       Completed without error       00%      2018         -
#18  Short offline       Completed without error       00%      1970         -
#19  Short offline       Completed without error       00%      1947         -
#20  Short offline       Completed without error       00%      1923         -
#21  Short offline       Completed without error       00%      1899         -


Justin.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/