[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64bb37e0710070144m6bc2c844oc96ef715b53b9819@mail.gmail.com>
Date: Sun, 7 Oct 2007 10:44:25 +0200
From: "Torsten Kaiser" <just.for.lkml@...glemail.com>
To: "Tejun Heo" <htejun@...il.com>
Cc: "Jeff Garzik" <jeff@...zik.org>, linux-kernel@...r.kernel.org,
akpm@...ux-foundation.org
Subject: Re: sata_sil24 broken since 2.6.23-rc4-mm1
On 10/5/07, Torsten Kaiser <just.for.lkml@...glemail.com> wrote:
> So I will use the weekend to see if I can find out who issues this
> command and add more debug to that place...
I added some DPRINTK to sil24_qc_issue and sil24_fill_sg, but I only
found one suspicious thing.
My sil24_fill_sg now looks like this:
static inline void sil24_fill_sg(struct ata_queued_cmd *qc,
struct sil24_sge *sge)
{
struct scatterlist *sg;
ata_for_each_sg(sg, qc) {
sge->addr = cpu_to_le64(sg_dma_address(sg));
sge->cnt = cpu_to_le32(sg_dma_len(sg));
if (ata_sg_is_last(sg, qc))
sge->flags = cpu_to_le32(SGE_TRM);
else
sge->flags = 0;
DPRINTK("flags,addr,cnt = 0x%x, 0x%X, 0x%X\n", sge->flags,
sge->addr, sge->cnt);
sge++;
}
}
Suspicious is, that *all* output from this DPRINTK shows flags as 0x0,
so the last sg is never terminated (SGE_TRM is 1<<31)?
But if that is the cause, how is this working at all? Or am I doing
something stupid?
Timing and outputs from five boots:
good: bad:
more moreboot more
3->35 3->35 3->35 3->35 3->35
3->2a 2->35 2->35 3->2a 3->2a
3->setup 2->2a 2->2a 3->setup 3->setup
2->35 2->35 2->35 2->35 2->35
1->35 3->2a 3->2a 1->35 1->35
2->2a 3->setup 3->setup 2->2a 2->2a
1->2a 1->35 1->35 1->2a 1->2a
2->35 1->2a 1->2a 2->35
1->35 1->35 1->35 1->35
3->int 3->int 3->int 3->int 3->int
3->35 3->35 3->35 3->35 3->35
1->5DF/1439C 1->5DC/1439C 1->5DE/1439C
2->5E0/143BC 2->5DE/143BC 2->5DF/143BC
sg:170E sg:1AAB sg:1A60
XXX:
5DD 5DF 5DC 5DF 5DE
5E0 5E0 5DE 5E0 5DF
The first three columns where working tries, the last two failed one drive.
column 1: ATA_DEBUG added, reboot
column 2: +my additions, reboot
column 3: +my additions, cold boot, wanted to make it fail, but worked
column 4: ATA_DEBUG added, cold boot
column 5: +my additions, cold boot
[x]->[y]: x is the ata-port, 1+2 on the sata_sil24, 3 on sata_nv with swncq
y:35 -> SYNCHRONIZE_CACHE commands that where send to the drive
y:2a -> WRITE_10 commands that where send to the drive
y:setup -> Debug from swncq: nv_swncq_dmafis: dma setup tag 0x0
y:int -> Debug from swncq: nv_swncq_host_interrupt: id 0x3 SWNCQ:
qc_active 0x1 ...
The lines before the XXX:
x->a/b: x is the ata-port, a the paddr from sil24_qc_issue, b the
activate from sil24_qc_issue
All outputs from sil24_qc_issue where identical in each boot sequence,
only differed from run to run.
sg:a: a is the sge->addr from sil24_fill_sg
The lines after the XXX:
This are the addresses that the XXX-printk from sil24_port_start prints.
I hope I explained enough what above table should mean.
This hole sequence (two syncs and one write to each drive) happens
between the output:
[ 40.300000] md1: bitmap initialized from disk: read 10/10 pages, set 87 bits
[ 40.320000] created bitmap (145 pages) for device md1
and the error on a bad boot:
[ 70.680000] ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[ 70.700000] ata2.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0
cdb 0x0 data 4096 out
or if on a good boot:
[ 40.910000] md: considering sdb1 ...
(sdb1 is part of another raid)
(If someone whats to complete bootlogs, just ask)
So now I have two questions:
1) What happens in sil24_fill_sg with SGE_TRM?
2) If that is ok, should I try to add debug to sil24_error_intr and/or
sil24_host_intr?
Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists