linux-kernel - Re: MD/RAID time out writing superblock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090909120218.GB21829@arachsys.com>
Date:	Wed, 9 Sep 2009 13:02:18 +0100
From:	Chris Webb <chris@...chsys.com>
To:	linux-scsi@...r.kernel.org
Cc:	Tejun Heo <tj@...nel.org>, Ric Wheeler <rwheeler@...hat.com>,
	Andrei Tanas <andrei@...as.ca>, NeilBrown <neilb@...e.de>,
	linux-kernel@...r.kernel.org,
	IDE/ATA development list <linux-ide@...r.kernel.org>,
	Jeff Garzik <jgarzik@...hat.com>, Mark Lord <mlord@...ox.com>
Subject: Re: MD/RAID time out writing superblock

Chris Webb <chris@...chsys.com> writes:

> I've also noticed that during this recovery, I'm seeing lots of timeouts but
> they don't seem to interrupt the resync:
> 
>   05:47:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>   05:47:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
>   05:47:39         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
>   05:47:39 ata5.00: status: { DRDY }
>   05:47:39 ata5: hard resetting link
>   05:47:49 ata5: softreset failed (device not ready)
>   05:47:49 ata5: hard resetting link
>   05:47:49 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>   05:47:49 ata5.00: configured for UDMA/133
>   05:47:49 ata5: EH complete
>   
>   08:17:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>   08:17:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
>   08:17:39         res 40/00:00:35:83:f8/00:00:4d:00:00/40 Emask 0x4 (timeout)
>   08:17:39 ata5.00: status: { DRDY }
>   08:17:39 ata5: hard resetting link
>   08:17:49 ata5: softreset failed (device not ready)
>   08:17:49 ata5: hard resetting link
>   08:17:49 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>   08:17:49 ata5.00: configured for UDMA/133
>   08:17:49 ata5: EH complete
>   
>   10:22:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>   10:22:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
>   10:22:39         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
>   10:22:39 ata5.00: status: { DRDY }
>   10:22:39 ata5: hard resetting link
>   10:22:49 ata5: softreset failed (device not ready)
>   10:22:49 ata5: hard resetting link
>   10:22:50 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>   10:22:51 ata5.00: configured for UDMA/133
>   10:22:51 ata5: EH complete

... the difference being that a timeout which causes a super_written failure
seems to return an I/O error whereas the others don't:

  ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
  ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
          res 40/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
  ata5.00: status: { DRDY }
  ata5: hard resetting link
  ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  ata5.00: configured for UDMA/133
  ata5: EH complete
  end_request: I/O error, dev sde, sector 1465147272
  md: super_written gets error=-5, uptodate=0
  raid10: Disk failure on sde3, disabling device.

I wonder what's different about these two timeouts such that one causes an I/O
error and the other just causes a retry after reset? Presumably if the latter
was also just a retry, everything would be (closer to being) fine.

Cheers,

Chris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/