linux-kernel - Re: [bisected] Re: todays git: WARNING: at drivers/ata/libata-sff.c:1017 ata_sff_hsm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4968C5CA.6010508@ru.mvista.com>
Date:	Sat, 10 Jan 2009 18:59:06 +0300
From:	Sergei Shtylyov <sshtylyov@...mvista.com>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
Cc:	Ingo Molnar <mingo@...e.hu>, Jeff Garzik <jeff@...zik.org>,
	Christian Borntraeger <borntraeger@...ibm.com>,
	linux-ide@...r.kernel.org, lkml <linux-kernel@...r.kernel.org>
Subject: Re: [bisected] Re: todays git: WARNING: at drivers/ata/libata-sff.c:1017
 ata_sff_hsm_move+0x45e/0x750()

Alan Cox wrote:

>>    All the S/G counts printed out were divisible by 4 (36 for INQUIRY and 96 
>>for REQUSET SENSE). It's the *actual* byte count for the REQUEST SENSE that's 
>>no divisible. The SCSI/ATAPI devices are free to sent less data than requested 
>>on non block transfer commands.

> That is just fine - if the sg list is not corrupt or being mishandled and
> the atapi pio code is not buggy.

> RTFS a bit and it becomes obvious that the core libata code has a bug:

    Oh, I have already... and saw where the issue could be. It just wasn't 
obvious why 32-bit PIO triggered it.

> From libata-sff.c:

>         /* consumed can be larger than count only for the last transfer */
>         WARN_ON_ONCE(qc->cursg && count != consumed);
> 
> The big clue turns out to be that the code doesn't match the comment.
> 
> Next note the check on qc->cursg. If my input sg list is a 36 byte single
> sg entry then qc->cursg should be NULL by the WARN_ON() - but it isn't.

    I think it's still not NULL because qc->cursg_ofs == sg->length check was 
*not* true right above, hence sg_next() wasn't called...

> If qc->cursg is NULL when the sg_next() is run then we don't warn because
> we are quite happy with the last segment being padded or underrunning.

    I don't think that sg_next() is called on an underrun segment. And here 
lies the mistake.

> What we actually want to explode on is a case where we transfer more
> bytes than are wanted and where there are more sg entries to perform - at
> that point we would corrupt.

> So at least one failure case is

> 	Core code issues an SG list for 96 bytes
> 	Drive indicates it wishes to return 18 bytes

> 	data_xfer transfers 18 bytes + 2 padding (correctly) -> 20 bytes
  	
> At this point __atapi_pio_bytes breaks

> 	it updates qc->curbytes by 18
> 	it updates the offset by 18

> 	The last segment is not exhausted so it does not update qc->cursg

> 	qc->cursg is not updated and the WARN erroneously uses !=

> The bogus WARN_ON_ONCE() triggers.

    Yes.

> So the bug is the WARN_ON being wrong. In fact __atapi_pio_bytes doesn't
> know enough to do the WARN check correctly as it doesn't know if it is
> the last request being made. It just happens it didn't break before
> because all our transfers are word aligned.

    Er... I'm not sure what's changed with 32-bit PIO patch.

> Alan

MBR, Sergei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/