linux-kernel - Re: SCSI or libata problem with an RDX removable disk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48C575C9.7090900@rtr.ca>
Date:	Mon, 08 Sep 2008 14:58:17 -0400
From:	Mark Lord <lkml@....ca>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
Cc:	Pascal GREGIS <pgs@...erway.com>, linux-kernel@...r.kernel.org
Subject: Re: SCSI or libata problem with an RDX removable disk

Alan Cox wrote:
>> Sep  4 08:03:08 devsni1 kernel: ata4: port is slow to respond, please be patient (Status 0xd0)
>> Sep  4 08:03:31 devsni1 kernel: ata4: port failed to respond (30 secs, Status 0xd0)
>> Sep  4 08:03:31 devsni1 kernel: ata4: soft resetting port
>> Sep  4 08:03:32 devsni1 kernel: ATA: abnormal status 0xD0 on port 0x0001d807
>> Sep  4 08:03:32 devsni1 last message repeated 4 times
> 
> Your disk went offline and then refused to come back when the link was
> reset. The initial trigger appears to have been the drive, the fact it
> didn't come back could either be the drive or a controller problem. We've
> seen a few cases where devices or controllers fail to recover from one
> end being stuck expecting data.
> 
> Mark Lord did some patches to try and drain data in this case but I don't
> remember if they were merged yet.
..

That would be this patch, currently not merged, not maintained,
and probably needs rework for some chipsets.  But for the record:


Tejun Heo wrote:
> Jeff Garzik wrote:
>> Tejun Heo wrote:
>>> Alan Cox wrote:
>>>>> I think there have been enough cases where this draining was necessary.
>>>>>  IIRC, ata_piix was involved in those cases, right?  If so, can you
>>>>> please submit a patch which applies this only to affected controllers?
>>>>> I don't feel too confident about applying this to all SFF controllers.
>>>> Old IDE does it on all controllers bar a couple. So we have a very good
>>>> knowledge of what does/doesn't work. The one that needs care in old ide
>>>> is an ordering issue where a state machine reset done first causes the
>>>> drain of the I/O to hang.
>>> Hmmm... So, do we apply draining to all PATA?  Or is ata_piix SATA
>>> affected too?
>> I would think all SFF controllers, since a lot of first gen SATA are
>> really bridged solutions.  If they are flagging DRQ, I say oblige them :)
>
> Alright, then the posted patch should be good enough.  Mark, can you be
> bothered to regenerate the patch and post it one more time (again)?  It
> seems we all agree the update is needed.

I think this original patch still applies cleanly on at least 2.6.23-rc7.

Drain up to 512 words from host/bridge FIFO on stuck DRQ HSM violation,
rather than just getting stuck there forever.

Signed-off-by: Mark Lord <mlord@...ox.com>
---

--- old/drivers/ata/libata-sff.c	2007-09-28 09:29:22.000000000 -0400
+++ linux/drivers/ata/libata-sff.c	2007-09-28 09:39:44.000000000 -0400
@@ -420,6 +420,28 @@
 	ap->ops->irq_on(ap);
 }
 
+static void ata_drain_fifo(struct ata_port *ap, struct ata_queued_cmd *qc)
+{
+	u8 stat = ata_chk_status(ap);
+	/*
+	 * Try to clear stuck DRQ if necessary,
+	 * by reading/discarding up to two sectors worth of data.
+	 */
+	if ((stat & ATA_DRQ) && (!qc || qc->dma_dir != DMA_TO_DEVICE)) {
+		unsigned int i;
+		unsigned int limit = qc ? qc->sect_size : ATA_SECT_SIZE;
+
+		printk(KERN_WARNING "Draining up to %u words from data FIFO.\n",
+									limit);
+		for (i = 0; i < limit ; ++i) {
+			ioread16(ap->ioaddr.data_addr);
+			if (!(ata_chk_status(ap) & ATA_DRQ))
+				break;
+		}
+		printk(KERN_WARNING "Drained %u/%u words.\n", i, limit);
+	}
+}
+
 /**
  *	ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller
  *	@ap: port to handle error for
@@ -476,7 +498,7 @@
 	}
 
 	ata_altstatus(ap);
-	ata_chk_status(ap);
+	ata_drain_fifo(ap, qc);
 	ap->ops->irq_clear(ap);
 
 	spin_unlock_irqrestore(ap->lock, flags);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/