[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <477C2A99.9010208@shaw.ca>
Date: Wed, 02 Jan 2008 18:21:45 -0600
From: Robert Hancock <hancockr@...w.ca>
To: Allen Martin <AMartin@...dia.com>
Cc: Jeff Garzik <jeff@...zik.org>, Tejun Heo <htejun@...il.com>,
Gabor Gombas <gombasg@...aki.hu>, linux-kernel@...r.kernel.org,
linux-ide@...r.kernel.org, Kuan Luo <kluo@...dia.com>,
Peer Chen <pchen@...dia.com>
Subject: Re: sata_nv + ADMA + Samsung disk problem
Allen Martin wrote:
>> The software definitely provides that guarantee for all NCQ-capable
>> controllers.
>>
>
> Well if that's not it, it must be some problem entering ADMA legacy
> mode. Here's what the Windows driver does:
>
>
> ADMACtrl.aGO = 0
> ADMACtrl.aEIEN = 0
> poll {
> until ADMAStatus.aLGCY = 1 || timeout
> }
What we're doing to enter legacy mode is essentially:
-wait until ADMA status indicates IDLE bit set (max wait of 1 microsecond)
-clear GO bit in control register
-wait until status indicates LEGACY bit set (max wait of 1 microsecond)
and to enter ADMA mode:
-set GO bit in control register
-wait until status indicates LEGACY bit cleared and IDLE bit set (max
wait of 1 microsecond)
The 1 microsecond timeout is pretty aggressive admittedly, but it
apparently isn't being broken (the only timeouts when switching modes
I've seen are during error handling after a command timeout has already
occurred). What timeout value is the Windows driver using?
Also, I see you are clearing the AEIN bit when in register mode, while
we're not. Is that important/necessary?
Aside from all this though, in the case of NCQ writes followed by a
cache flush, that sequence of commands won't put us into legacy mode at
all since the cache flush is a no-data command which we should be able
to handle in ADMA mode, from my understanding (correct me if I'm wrong).
So I don't imagine legacy/ADMA mode switch could be the cause of this
problem.
I also saw in my previous investigation that a flush immediately
followed by a write could cause the write to time out as well.
From some of the traces I took previously (posted on LKML as "sata_nv
ADMA controller lockup investigation" way back in Feb 07), what seems to
occur is that when the second command is issued very rapidly (within
less than 20 microseconds, or potentially longer) after the previous
command's completion, the ADMA status changes from 0x500 (STOPPED and
IDLE) to 0x400 (just IDLE) as it typically does, but then it sticks
there, no interrupt is ever raised, and CPB response flags remain at 0.
--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@...pamshaw.ca
Home Page: http://www.roberthancock.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists