lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 09 Nov 2006 23:00:08 +0900
From:	Tejun Heo <htejun@...il.com>
To:	Brice Goglin <Brice.Goglin@...-lyon.org>
CC:	Jens Axboe <jens.axboe@...cle.com>,
	Gregor Jasny <gjasny@...glemail.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Jeff Garzik <jgarzik@...ox.com>, linux-ide@...r.kernel.org,
	Douglas Gilbert <dougg@...que.net>, monty@...h.org
Subject: Re: 2.6.19-rc3 system freezes when ripping with cdparanoia at ioctl(SG_IO)

[CC'ing Monty and Douglas.]

Hello, the original thread can be read from the following URL.

http://thread.gmane.org/gmane.linux.ide/13708/focus=13708

Brice Goglin wrote:
> ens Axboe wrote:
>> On Mon, Oct 30 2006, Gregor Jasny wrote:
>>   
>>> 2006/10/30, Jens Axboe <jens.axboe@...cle.com>:
>>>     
>>>> Can you confirm that 2.6.18 works?
>>>>       
>>> The reporter of [1] states that his SATA Thinkpad freezes with 2.6.17
>>> and 2.6.18, too.
>>>
>>> Gregor
>>>
>>> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=391901
>>>     
>> Ok, mainly just checking if this was a potential dupe of another bug.
>>
>>   
> 
> Jens (or anybody else who has any idea of how to debug this),
> 
> Did you have a chance to reproduce the problem? I guess we "only" need a
> machine with SATA/ata_piix and cdparanoia 3.10. If you want me to debug
> some stuff, feel free to tell me what. But, since it freezes the machine
> and sysrq doesn't even work, I don't really know what to try...
> 
> I just tried on rc5 and rc5-mm1, both have the problem (as 2.6.16, .17
> and .18 do, don't know about earlier kernels). I didn't have a audio CD
> here, so I tried abcde on a DVD on purpose. With cdparanoia 3.10-pre0
> (from Debian testing), it reports nothing during about 5 seconds and
> then the machine freezes. With cdparanoia 3a9.8-11 (from Debian stable),
> it reports an error very quickly, and dmesg gets a couple line like these:
>     sg_write: data in/out 12/12 bytes for SCSI command 0x43--guessing
> data in;
>        program cdparanoia not setting count and/or reply_len properly

Okay, here's the story.

In interface/scan_devices.c::cdda_identify_scsi(), cdparanoia calls 
scsi_inquiry() to identify the device and determine interface type. 
This seems to be the first time to actually issue commands to the 
device.  As interface type isn't completely determined, for sg devices, 
it first issues the command w/ d->interface set to SGIO_SCSI.  If that 
fails, it falls back to SGIO_SCSI_BUGGY1.

For to-device request, both SGIO_SCSI and SGIO_SCSI_BUGGY1 set 
sg_io_hdr.dxfer_direction to SG_DXFER_TO_DEV.  But for from-device 
request, SGIO_SCSI uses SG_DXFER_TO_FROM_DEV while SGIO_SCSI_BUGGY1 uses 
SG_DXFER_FROM_DEV.  So, cdparanoia first issues inquiry w/ 
SG_DXFER_TO_FROM_DEV and if that fails falls back to SG_DXFER_FROM_DEV.

drivers/scsi/sg.c interprets SG_DXFER_TO_FROM_DEV as read while 
block/scsi_ioctl.c interprets it as write.  I guess this is historic 
thing (scsi/sg.c updated but block/scsi_ioctl.c is forgotten).  As 
written above, cdparanoia can handle both cases as long as the kernel 
promptly fails command issued with the wrong direction.

This works for most PATA ATAPI devices.  Most devices detect reversed 
transfer and terminate the command promptly.  But this doesn't seem to 
be true for SATA device.  Many just hang and time out commands with the 
wrong transfer direction.  If you consider that most early SATA ATAPI 
devices are actually PATA + bridge, this is sorta inevitable.  The 
PATA-SATA bridge cannot issue D2H FIS to abort the command by itself. 
It's just mirroring the status of PATA side and PATA side doesn't know 
SATA protocol mismatch has occurred.

So, IDENTIFY w/ write-DMA protocol times out after quite some seconds. 
This is where things go worse from bad.  SATA controllers which have 
shadow TF registers don't handle timeout conditions very well, 
especially when they're waiting for data transfer.  They basically hold 
the PCI bus and hang till the transfer completes (which never happens). 
  That's where the hard lock up comes from.

Jens, I think we need to match block sg's behavior to SCSI's.  Monty, 
the timeout and hard lock up are due to hardware restrictions.  Kernel 
and libata can't do much about it.  So, please find other way to detect 
interface.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ