[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4968FFD7.6070302@ru.mvista.com>
Date: Sat, 10 Jan 2009 23:06:47 +0300
From: Sergei Shtylyov <sshtylyov@...mvista.com>
To: Alan Cox <alan@...rguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@...e.hu>, Jeff Garzik <jeff@...zik.org>,
Christian Borntraeger <borntraeger@...ibm.com>,
linux-ide@...r.kernel.org, lkml <linux-kernel@...r.kernel.org>
Subject: Re: [bisected] Re: todays git: WARNING: at drivers/ata/libata-sff.c:1017
ata_sff_hsm_move+0x45e/0x750()
Hello, I wrote:
>>> All the S/G counts printed out were divisible by 4 (36 for
>>> INQUIRY and 96 for REQUSET SENSE). It's the *actual* byte count for
>>> the REQUEST SENSE that's no divisible. The SCSI/ATAPI devices are
>>> free to sent less data than requested on non block transfer commands.
>
>> That is just fine - if the sg list is not corrupt or being mishandled
>> and
>> the atapi pio code is not buggy.
>
>> RTFS a bit and it becomes obvious that the core libata code has a bug:
>
> Oh, I have already... and saw where the issue could be. It just
> wasn't obvious why 32-bit PIO triggered it.
Got it now, however the issue doesn't seem as evident simple to me...
>> From libata-sff.c:
>> /* consumed can be larger than count only for the last
>> transfer */
>> WARN_ON_ONCE(qc->cursg && count != consumed);
>>
>> The big clue turns out to be that the code doesn't match the comment.
>>
>> Next note the check on qc->cursg. If my input sg list is a 36 byte
>> single
>> sg entry then qc->cursg should be NULL by the WARN_ON() - but it isn't.
>
> I think it's still not NULL because qc->cursg_ofs == sg->length
> check was *not* true right above, hence sg_next() wasn't called...
>
>> If qc->cursg is NULL when the sg_next() is run then we don't warn
>> because
>> we are quite happy with the last segment being padded or underrunning.
>
> I don't think that sg_next() is called on an underrun segment. And
> here lies the mistake.
>
>> What we actually want to explode on is a case where we transfer more
>> bytes than are wanted and where there are more sg entries to perform
>> - at
>> that point we would corrupt.
>
>> So at least one failure case is
>
>> Core code issues an SG list for 96 bytes
>> Drive indicates it wishes to return 18 bytes
>
>> data_xfer transfers 18 bytes + 2 padding (correctly) -> 20 bytes
Correctly indeed? I'm not at all sure it's correct to read an extra
16-bit word off the device when it thinks it's already done with the
data transfer. This is not the same as to read 16-bit word and ignore
its MSB as it happened. The same concern about the writes... Note that
the IDE code doesn't do this...
MBR, Sergei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists