lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 17 Jan 2010 22:11:33 +0100
From:	Torsten Kaiser <just.for.lkml@...glemail.com>
To:	Robert Hancock <hancockrwd@...il.com>
Cc:	Tejun Heo <tj@...nel.org>, linux-kernel@...r.kernel.org,
	Jeff Garzik <jgarzik@...ox.com>, linux-ide@...r.kernel.org
Subject: Re: MSI broken in libata?

On Sun, Jan 17, 2010 at 8:22 PM, Robert Hancock <hancockrwd@...il.com> wrote:
> On 01/16/2010 03:58 PM, Torsten Kaiser wrote:
>> Looking at my lspci output I noted the following:
>> For the PCIe-bridges:
>>        Capabilities: [80] Express (v1) Root Port (Slot+), MSI 00
>>                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency
>> L0s<64ns, L1<1us
>>                        ExtTag- RBE+ FLReset-
>>                DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
>> Unsupported-
>>                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>>                        MaxPayload 128 bytes, MaxReadReq 512 bytes
>> For the tg3 onboard network chips:
>>        Capabilities: [d0] Express (v1) Endpoint, MSI 00
>>                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s<4us,
>> L1 unlimited
>>                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>>                DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
>> Unsupported-
>>                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
>> For the SiI chip:
>>        Capabilities: [70] Express (v1) Legacy Endpoint, MSI 00
>>                DevCap: MaxPayload 1024 bytes, PhantFunc 0, Latency
>> L0s<64ns, L1<1us
>>                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>>                DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
>> Unsupported-
>>                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>                        MaxPayload 128 bytes, MaxReadReq 4096 bytes
>>
>> So the maximum payload for it is bigger then that of the nVidia bridge.
>> As I don't have knowlegde of the PCI specs, I guess DevCap is what a
>> device is physically capable and DevCtl is the value that the BIOS /
>> kernel hat programmed into it for actual use.
>> If my guess is correct, then the SiI should be correctly limited to
>> 128 bytes payload and that it should work.
>>
>> BUT: Page 47 of the SiI-PDF says for 'Device Status and Control' the
>> following:
>> Bit [14:12]: Max Read Request Size (R/W) – Allowable values are 000B
>> to 011B (128 to 1024 bytes).
>> Default is 010B (512 bytes).
>>
>> So a MaxReadReq value of 4096 as indicated by lspci for my system
>> would be out of bounds.
>>
>> Is is important? (Somehow it seems not: In the Not-MSI-case it is also
>> 4096 bytes, but the system works fine...)
>>
>>
>> Can I do anything else to help debug this?
>
> I don't think the MaxReadReq difference would be an issue - it's the max
> request size that device is allowed to generate, not the max it can accept.
> In any case, not sure how it would affect MSI since those requests would be
> a write, not a read, and would be tiny. You could always try changing it (I
> think setpci should be able to do it, though you might need to dig through
> specs to find out which bits those are).

As the Not-MSI-case is working with the 4096 setting this does not
look very promising as a fix.
The docs from SiI did say, that this field is R/W so setting it via
setpci might work.
But I will defer poking random bytes into PCI configuration space for
later... ;-)

> Unfortunately I don't have any great debug suggestions other than those.. My
> first suspect would still be some kind of HT-MSI mapping issue, but it's
> strange that only writes seem to be having problems..

The easiest way to trigger it, was booting into a static shell via
init=/bin/sash and doing:
dd if=/dev/zero of=/XFS-FS/on/a/disk/connected/to/sii3132 bs=1k count=1M

Reading seemed to work, as even with MSI enabled I was able to mount
the fs and start several programs (like lspci). But when I tried to
write the output to a temporary file, the errors started to appear.

I did a little bit more stress testing with iozone on a ro mounted fs
and that failed too.
So it is not only writing that fails.

test1: iozone -i 1:
[  143.010051] ata2.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
[  143.010060] ata2.00: failed command: READ FPDMA QUEUED
[  143.010071] ata2.00: cmd 60/00:00:ad:71:d9/01:00:01:00:00/40 tag 0
ncq 131072 in
[  143.010073]          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask
0x4 (timeout)
[  143.010079] ata2.00: status: { DRDY }
[  143.010083] ata2.00: failed command: READ FPDMA QUEUED
[  143.010092] ata2.00: cmd 60/00:08:ad:70:d9/01:00:01:00:00/40 tag 1
ncq 131072 in
[  143.010095]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[  143.010099] ata2.00: status: { DRDY }
[  143.010108] ata2: hard resetting link

test2: dd if=file-on-sii-attached-diskl of=/dev/null bs=1k
[  112.950063] ata2.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
[  112.950076] ata2.00: failed command: READ FPDMA QUEUED
[  112.950088] ata2.00: cmd 60/00:00:ad:71:d9/01:00:01:00:00/40 tag 0
ncq 131072 in
[  112.950091]          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask
0x4 (timeout)
[  112.950096] ata2.00: status: { DRDY }
[  112.950101] ata2.00: failed command: READ FPDMA QUEUED
[  112.950110] ata2.00: cmd 60/00:08:ad:70:d9/01:00:01:00:00/40 tag 1
ncq 131072 in
[  112.950113]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[  112.950117] ata2.00: status: { DRDY }
[  112.950127] ata2: hard resetting link
dd said, that it read 181 Mb before failing

test3: dd if=file-on-sii-attached-diskl of=/dev/null bs=1k iflag=direct
[  118.040055] ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[  118.040065] ata2.00: failed command: READ FPDMA QUEUED
[  118.040077] ata2.00: cmd 60/02:00:a5:13:d4/00:00:01:00:00/40 tag 0
ncq 1024 in
[  118.040079]          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask
0x4 (timeout)
[  118.040085] ata2.00: status: { DRDY }
[  118.040095] ata2: hard resetting link
dd said, that it only copied 1.4 Mb this time

test4: setting the queue_depth to 1 and repeating test2
[  248.950055] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[  248.950066] ata2.00: failed command: READ DMA
[  248.950078] ata2.00: cmd c8/00:00:ad:79:d9/00:00:00:00:00/e1 tag 0
dma 131072 in
[  248.950080]          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask
0x4 (timeout)
[  248.950085] ata2.00: status: { DRDY }
[  248.950095] ata2: hard resetting link
dd failed again at 183 Mb

All these tests where run with an unpatched 2.6.33-rc4.

What was different between -rc4 and my earlier tests was, that there
where additional error messages from do_IRQ.
sata_sil in MSI mode:
[    2.491103] sata_sil24 0000:04:00.0: version 1.1
[    2.491124] sata_sil24 0000:04:00.0: PCI INT A -> Link[LNEB] -> GSI
19 (level, low) -> IRQ 19
[    2.499751]   alloc irq_desc for 29 on node 0
[    2.499752]   alloc kstat_irqs on node 0
[    2.499765] sata_sil24 0000:04:00.0: irq 29 for MSI/MSI-X
[    2.499772] sata_sil24 0000:04:00.0: Using MSI
[    2.504265] sata_sil24 0000:04:00.0: setting latency timer to 64
[    2.504495] scsi0 : sata_sil24
[    2.507735] scsi1 : sata_sil24
[    2.510918] ata1: SATA max UDMA/100 host m128@...feffc00 port
0xefef8000 irq 29
[    2.518246] ata2: SATA max UDMA/100 host m128@...feffc00 port
0xefefa000 irq 29
[snip]
I mount the test fs with -o ro:
[   83.649367] XFS mounting filesystem sdb2
[   83.757942] Ending clean XFS mount for filesystem: sdb2
I start the iozone test:
[  112.382903] do_IRQ: 0.165 No irq handler for vector (irq -1)
[  112.382923] do_IRQ: 3.165 No irq handler for vector (irq -1)
[  112.382939] do_IRQ: 1.165 No irq handler for vector (irq -1)
[  112.382957] do_IRQ: 2.165 No irq handler for vector (irq -1)
[  143.010051] ata2.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x6 frozen
[  143.010060] ata2.00: failed command: READ FPDMA QUEUED
[  143.010071] ata2.00: cmd 60/00:00:ad:71:d9/01:00:01:00:00/40 tag 0
ncq 131072 in
[  143.010073]          res 40/00:00:01:4f:c2/00:00:00:00:00/00 Emask
0x4 (timeout)
[  143.010079] ata2.00: status: { DRDY }
[  143.010083] ata2.00: failed command: READ FPDMA QUEUED
[  143.010092] ata2.00: cmd 60/00:08:ad:70:d9/01:00:01:00:00/40 tag 1
ncq 131072 in
[  143.010095]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)
[  143.010099] ata2.00: status: { DRDY }
[  143.010108] ata2: hard resetting link
[  145.210058] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  145.215060] do_IRQ: 1.165 No irq handler for vector (irq -1)
[  145.215074] do_IRQ: 0.165 No irq handler for vector (irq -1)
[  145.215088] do_IRQ: 2.165 No irq handler for vector (irq -1)
[  145.215097] do_IRQ: 3.165 No irq handler for vector (irq -1)
[  150.210036] ata2.00: qc timeout (cmd 0xec)
[  150.210046] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[  150.210050] ata2.00: revalidation failed (errno=-5)
[  150.210058] ata2: hard resetting link
[  152.410050] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
[  162.410031] ata2.00: qc timeout (cmd 0xec)
[  162.410039] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x5)
[  162.410043] ata2.00: revalidation failed (errno=-5)
[  162.410048] ata2: limiting SATA link speed to 1.5 Gbps
[  162.410053] ata2: hard resetting link
[  164.610049] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[  164.615197] ata2.00: failed to read native max address (err_mask=0x1)
[  164.615203] ata2.00: HPA support seems broken, skipping HPA handling
[  169.610037] ata2.00: qc timeout (cmd 0xef)
[  169.610045] ata2.00: failed to set xfermode (err_mask=0x4)
[  169.610050] ata2.00: disabled
[  169.610071] ata2.00: device reported invalid CHS sector 0
[  169.610091] ata2: hard resetting link
[  171.810055] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 10)
[  171.810080] ata2: EH complete
[  171.810107] sd 1:0:0:0: [sdb] Unhandled error code
[  171.810112] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK
[  171.810118] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 01 d9 70 ad 00 01 00 00
[  171.810132] end_request: I/O error, dev sdb, sector 31027373
[  171.810166] sd 1:0:0:0: [sdb] Unhandled error code
[  171.810170] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK
[  171.810175] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 01 d9 71 ad 00 01 00 00
[  171.810188] end_request: I/O error, dev sdb, sector 31027629
[  171.810204] sd 1:0:0:0: [sdb] Unhandled error code
[  171.810209] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK
[  171.810215] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 01 d9 70 ad 00 00 08 00
[  171.810229] end_request: I/O error, dev sdb, sector 31027373
[  171.810239] sd 1:0:0:0: [sdb] READ CAPACITY(16) failed
[  171.810245] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK
[  171.810252] sd 1:0:0:0: [sdb] Sense not available.
[  171.810305] sd 1:0:0:0: [sdb] READ CAPACITY failed
[  171.810309] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK
[  171.810314] sd 1:0:0:0: [sdb] Sense not available.
[  171.810387] sd 1:0:0:0: [sdb] Asking for cache data failed
[  171.810392] sd 1:0:0:0: [sdb] Assuming drive cache: write through
[  171.810405] sdb: detected capacity change from 640135028736 to 0
[  196.165461] sd 1:0:0:0: [sdb] READ CAPACITY(16) failed
[  196.165468] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK
[  196.165475] sd 1:0:0:0: [sdb] Sense not available.
[  196.165531] sd 1:0:0:0: [sdb] READ CAPACITY failed
[  196.165535] sd 1:0:0:0: [sdb] Result: hostbyte=DID_BAD_TARGET
driverbyte=DRIVER_OK
[  196.165540] sd 1:0:0:0: [sdb] Sense not available.
[  196.165617] sd 1:0:0:0: [sdb] Asking for cache data failed
[  196.165622] sd 1:0:0:0: [sdb] Assuming drive cache: write through

I will look, if I can find out, where the do_IRQ error comes from and
why I didn't see it with -rc1 to -rc3.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ