lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200909111004.03889.tfjellstrom@shaw.ca>
Date:	Fri, 11 Sep 2009 10:04:03 -0600
From:	Thomas Fjellstrom <tfjellstrom@...w.ca>
To:	linux-kernel@...r.kernel.org
Cc:	Jeff Garzik <jeff@...zik.org>,
	"linux-scsi" <linux-scsi@...r.kernel.org>,
	Linux IDE mailing list <linux-ide@...r.kernel.org>
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Thu September 10 2009, Thomas Fjellstrom wrote:
> On Wed September 9 2009, Thomas Fjellstrom wrote:
> > On Wed September 9 2009, Thomas Fjellstrom wrote:
> > > On Wed September 9 2009, Jeff Garzik wrote:
> > > > On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
> > > > > No errors on that disk. Other than the one above, and its more of a
> > > > > warning. However, I just rebooted to add some extra drives,
> > > > > thinking everything was working a little better now that I've
> > > > > updated to 2.6.31-rc9, I'm treated to the following two messages
> > > > > right after boot (and a system lockup to boot):
> > > > >
> > > > > kernel: [  971.033138] ------------[ cut here ]------------
> > > > > kernel: [  971.033211] WARNING: at drivers/ata/libata-core.c:4913
> > > > > __ata_qc_complete+0x5a/0xe1 [libata]()
> > > > > kernel: [  971.033217] Hardware name: GA-MA790FXT-UD5P
> > > > > kernel: [  971.033221] Modules linked in: powernow_k8
> > > > > cpufreq_conservative cpufreq_stats cpufreq_userspace
> > > > > cpufreq_powersave kvm_amd kvm nfsd exportfs nfs lockd fscache
> > > > > nfs_acl auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473
> > > > > firewire_sbp2 loop md_mod
> > > > > snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
> > > > > snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi
> > > > > snd_seq_midi_event snd_seq snd_timer snd_seq_device snd
> > > > > amd64_edac_mod edac_core i2c_piix4 soundcore snd_page_alloc
> > > > > i2c_core evdev wmi parport_pc button parport processor ext3 jbd
> > > > > mbcache dm_mod sg sr_mod cdrom sd_mod crc_t10dif usbhid ata_generic
> > > > > ide_pci_generic hid mvsas firewire_ohci libsas firewire_core
> > > > > crc_itu_t
> > > > > scsi_transport_sas r8169 atiixp ide_core floppy ahci mii ohci_hcd
> > > > > libata ehci_hcd scsi_mod thermal fan thermal_sys [last unloaded:
> > > > > scsi_wait_scan]
> > > > > kernel: [  971.033337] Pid: 0, comm: swapper Not tainted 2.6.31-rc9
> > > > > #2 kernel: [  971.033342] Call Trace:
> > > > > kernel: [  971.033346]<IRQ>   [<ffffffffa00562ca>] ?
> > > > > __ata_qc_complete+0x5a/0xe1 [libata]
> > > > > kernel: [  971.033434]  [<ffffffffa00562ca>] ?
> > > > > __ata_qc_complete+0x5a/0xe1 [libata]
> > > > > kernel: [  971.033446]  [<ffffffff8104aca0>] ?
> > > > > warn_slowpath_common+0x77/0xa3 kernel: [  971.033455]
> > > > > [<ffffffff81038d06>] ? enqueue_task+0x5c/0x65 kernel: [ 
> > > > > 971.033496] [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1
> > > > > [libata] kernel: [  971.033519]  [<ffffffffa00f7b59>] ?
> > > > > sas_ata_task_done+0x178/0x210 [libsas]
> > > > > kernel: [  971.033528]  [<ffffffff8115ead1>] ?
> > > > > blk_run_queue+0x21/0x35 kernel: [  971.033548] 
> > > > > [<ffffffffa010e2ce>] ?
> > > > > mvs_slot_complete+0x3df/0x41b [mvsas]
> > > > > kernel: [  971.033565]  [<ffffffffa010e39c>] ?
> > > > > mvs_int_rx+0x92/0x101 [mvsas] kernel: [  971.033583] 
> > > > > [<ffffffffa01112ba>] ?
> > > > > mvs_int_full+0x25/0x88 [mvsas] kernel: [  971.033600]
> > > > > [<ffffffffa011134e>] ? mvs_64xx_isr+0x31/0x40 [mvsas] kernel: [
> > > > > 971.033617]  [<ffffffffa010d0e5>] ? mvs_interrupt+0x61/0x78 [mvsas]
> > > > > kernel: [  971.033625]  [<ffffffff8108aaac>] ?
> > > > > handle_IRQ_event+0x58/0x135 kernel: [  971.033633]
> > > > > [<ffffffff8108c1a1>] ? handle_fasteoi_irq+0x7d/0xb5 kernel: [
> > > > > 971.033642]
> > > > > [<ffffffff8101388d>] ? handle_irq+0x17/0x1d
> > > >
> > > > That warning is triggered when an ata_queued_cmd is passed to
> > > > completion without the ATA_QCFLAG_ACTIVE flag being set (which
> > > > indicates the qc was started with some activity).
> > > >
> > > > That possibly indicates the low-level driver (or libsas) was passing
> > > > an already-completed cmd to libata.
> > > >
> > > > > The added hard drives are connected to a Supermicro AOC-SASLP-MV8,
> > > > > which is based on a marvel MV64460/64461/64462 chipset, which uses
> > > > > the sata_mv driver.
> > > >
> > > > Surely you mean 'mvsas' driver?
> > >
> > > Yes, sorry I did mean mvsas.
> > >
> > > I am more concerned about the actual oops/BUG rather than the warning
> > >  though. Unless the problem causing the warning is also causing the
> > > oops.
> > >
> > > > 	Jeff
> >
> > Thanks for taking a look so far. But I'm having more and more trouble
> > with this card as the days go by:
> >
> > [  464.792214] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  464.792222] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  494.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  494.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  494.816192] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  494.816197] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  525.817335] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  525.817343] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  525.817358] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  525.817363] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  556.816148] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  556.816157] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  556.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  556.816175] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  587.816171] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  587.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  587.816193] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  587.816199] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  600.616255] INFO: task mount:4395 blocked for more than 120 seconds.
> > [  600.616263] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [  600.616270] mount         D 0000000000000000     0  4395   4229
> >  0x00000000 [  600.616281]  ffff88012fb6f780 0000000000000082
> >  ffff8800b808ddc8 ffff8800b80f3d60
> > [  600.616290]  ffff880128923c90 0000000000014800 000000000000f800
> > ffff88012dd91840
> > [  600.616299]  ffff88012dd91b38 0000000300000008 0000000000000000
> > ffff88010c5bfa88
> > [  600.616308] Call Trace:
> > [  600.616324]  [<ffffffff81017015>] ? read_tsc+0xa/0x20
> > [  600.616335]  [<ffffffff810adc63>] ? __pagevec_free+0x29/0x3b
> > [  600.616343]  [<ffffffff810661e9>] ? getnstimeofday+0x55/0xaf
> > [  600.616351]  [<ffffffff810a8e69>] ? sync_page+0x0/0x46
> > [  600.616361]  [<ffffffff812dc8cb>] ? io_schedule+0x63/0xa5
> > [  600.616368]  [<ffffffff810a8eaa>] ? sync_page+0x41/0x46
> > [  600.616376]  [<ffffffff812dcade>] ? __wait_on_bit_lock+0x3f/0x84
> > [  600.616383]  [<ffffffff810a8e55>] ? __lock_page+0x5d/0x63
> > [  600.616391]  [<ffffffff8105ef14>] ? wake_bit_function+0x0/0x23
> > [  600.616401]  [<ffffffff810b0ad4>] ? pagevec_lookup+0x17/0x1e
> > [  600.616408]  [<ffffffff810b1bbd>] ?
> >  truncate_inode_pages_range+0x288/0x318 [  600.616418]
> >  [<ffffffff810fc8cb>] ? set_blocksize+0xc2/0xd2
> > [  600.616426]  [<ffffffff810fc8f2>] ? sb_set_blocksize+0x17/0x43
> > [  600.616477]  [<ffffffffa04aa8e0>] ? ext4_fill_super+0x1cc/0x2060
> > [ext4] [  600.616486]  [<ffffffff8117593b>] ? snprintf+0x44/0x4c
> > [  600.616493]  [<ffffffff810fb5b7>] ? check_disk_change+0x22/0x52
> > [  600.616501]  [<ffffffff812dd680>] ? __down_write_nested+0x15/0xab
> > [  600.616524]  [<ffffffff810dbfe6>] ? get_sb_bdev+0x111/0x159
> > [  600.616571]  [<ffffffffa04aa714>] ? ext4_fill_super+0x0/0x2060 [ext4]
> > [  600.616582]  [<ffffffff810dbc44>] ? vfs_kern_mount+0x95/0x111
> > [  600.616593]  [<ffffffff810dbd13>] ? do_kern_mount+0x43/0xe2
> > [  600.616607]  [<ffffffff810ef49d>] ? do_mount+0x767/0x7d6
> > [  600.616620]  [<ffffffff810ef591>] ? sys_mount+0x85/0xc8
> > [  600.616633]  [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> > [  618.816175] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  618.816184] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  618.816196] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  618.816201] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  618.816241] sd 1:0:3:0: [sdf] Unhandled error code
> > [  618.816247] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> > driverbyte=DRIVER_TIMEOUT
> > [  618.816255] end_request: I/O error, dev sdf, sector 160
> > [  618.816263] Buffer I/O error on device md2, logical block 20
> > [  618.816274] Buffer I/O error on device md2, logical block 21
> > [  618.816280] Buffer I/O error on device md2, logical block 22
> > [  618.816285] Buffer I/O error on device md2, logical block 23
> > [  618.816291] Buffer I/O error on device md2, logical block 24
> > [  618.816298] Buffer I/O error on device md2, logical block 25
> > [  618.816307] Buffer I/O error on device md2, logical block 26
> > [  618.816316] Buffer I/O error on device md2, logical block 27
> > [  618.816324] Buffer I/O error on device md2, logical block 28
> > [  618.816331] Buffer I/O error on device md2, logical block 29
> > [  649.780185] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  649.780194] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  649.780208] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  649.780214] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  649.780293] ------------[ cut here ]------------
> > [  649.780366] WARNING: at drivers/ata/libata-core.c:5129
> > ata_qc_issue+0x10a/0x347 [libata]()
> > [  649.780373] Hardware name: GA-MA790FXT-UD5P
> > [  649.780377] Modules linked in: ext4 jbd2 crc16 raid0 powernow_k8
> > cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave
> >  kvm_amd kvm nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc
> >  bridge stp it87 hwmon_vid adt7473 firewire_sbp2 loop md_mod
> >  snd_hda_codec_realtek
> > snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm
> > snd_seq_midi snd_rawmidi snd_seq_midi_event amd64_edac_mod snd_seq
> >  edac_core snd_timer snd_seq_device snd soundcore i2c_piix4
> > snd_page_alloc i2c_core evdev parport_pc wmi parport button processor
> > ext3 jbd mbcache dm_mod usbhid hid sg sr_mod cdrom sd_mod crc_t10dif
> > ata_generic ide_pci_generic firewire_ohci firewire_core ohci_hcd
> > crc_itu_t atiixp ide_core mvsas ehci_hcd ahci libsas libata
> > scsi_transport_sas scsi_mod r8169 mii floppy thermal fan thermal_sys
> > [last unloaded: scsi_wait_scan] [  649.780499] Pid: 3185, comm: hddtemp
> > Not tainted 2.6.31-rc9 #2 [  649.780504] Call Trace:
> > [  649.780551]  [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
> > [  649.780593]  [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347 [libata]
> > [  649.780607]  [<ffffffff8104aca0>] ? warn_slowpath_common+0x77/0xa3
> > [  649.780649]  [<ffffffffa00790ce>] ? ata_scsi_pass_thru+0x0/0x240
> >  [libata] [  649.780690]  [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347
> >  [libata] [  649.780733]  [<ffffffffa00401d6>] ?
> > scsi_get_command+0x75/0x97 [scsi_mod] [  649.780776] 
> > [<ffffffffa00790ce>] ?
> >  ata_scsi_pass_thru+0x0/0x240 [libata] [  649.780815] 
> > [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod] [  649.780858] 
> > [<ffffffffa007a4d5>] ? __ata_scsi_queuecmd+0x185/0x1dc [libata]
> > [  649.780896]  [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
> > [  649.780918]  [<ffffffffa00a3c8e>] ? sas_queuecommand+0x83/0x25d
> > [libsas] [  649.780956]  [<ffffffffa003fa7c>] ?
> > scsi_dispatch_cmd+0x1c0/0x23c [scsi_mod]
> > [  649.780996]  [<ffffffffa0044ff0>] ? scsi_request_fn+0x3a5/0x506
> >  [scsi_mod] [  649.781006]  [<ffffffff810546e0>] ? del_timer+0x59/0x62
> > [  649.781016]  [<ffffffff81163b70>] ? blk_execute_rq_nowait+0x65/0x89
> > [  649.781032]  [<ffffffffa014164f>] ? sg_common_write+0x489/0x4ab [sg]
> > [  649.781042]  [<ffffffff8115df56>] ? __freed_request+0x26/0x83
> > [  649.781056]  [<ffffffffa01421da>] ? sg_new_write+0x23e/0x269 [sg]
> > [  649.781070]  [<ffffffffa0142473>] ? sg_ioctl+0x26e/0xb63 [sg]
> > [  649.781080]  [<ffffffff81100f38>] ? inotify_d_instantiate+0x12/0x39
> > [  649.781088]  [<ffffffff8105eee6>] ? autoremove_wake_function+0x0/0x2e
> > [  649.781098]  [<ffffffff810d80bf>] ? fd_install+0x2e/0x5a
> > [  649.781105]  [<ffffffff810e5247>] ? vfs_ioctl+0x56/0x6c
> > [  649.781111]  [<ffffffff810e570a>] ? do_vfs_ioctl+0x437/0x475
> > [  649.781118]  [<ffffffff810e5799>] ? sys_ioctl+0x51/0x70
> > [  649.781128]  [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> > [  649.781134] ---[ end trace 9005373b1b9c6eb7 ]---
> > [  680.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  680.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  711.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  711.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  741.816069] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  741.816078] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  741.816090] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  741.816096] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  772.820160] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  772.820168] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  772.820181] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  772.820186] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  803.816212] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  803.816220] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  803.816233] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > [  803.816239] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > [  803.816264] sd 1:0:3:0: [sdf] Unhandled error code
> > [  803.816270] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> > driverbyte=DRIVER_TIMEOUT
> > [  803.816278] end_request: I/O error, dev sdf, sector 512
> > [  803.816284] __ratelimit: 70 callbacks suppressed
> > [  803.816290] Buffer I/O error on device md2, logical block 64
> >
> > That's after bringing up a raid0 array I build a few days ago on 4
> >  perfectly good (Seagate Baracuda 1TB 7200.12) disks, without the bad
> > disk plugged in. I try to mount it, and the driver hangs. Anything trying
> > to access any of the 4 disks hangs as well.
> >
> > I know this array worked a few days ago. The most major change I've made
> >  was upgrade from -rc8 (or -rc5, not sure if I mounted the array under
> >  -rc8) to -rc9.
> 
> Hi, just wondering if anyone has had a chance to look at this, or if
>  there's some patches I should try out, or if you need me to do some
>  testing, I'd be glad to, thanks :)
> 

If this makes it any more confusing, I do not have the suspected bad drive 
connected, and have been getting the following message from dmesg for the past 
day:

mvsas 0000:04:00.0: mvsas exec failed[-132]!
ata7: no sense translation for status: 0x00
ata7: translated ATA stat/err 0x00/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata7: status=0x00 { }
mvsas 0000:04:00.0: mvsas exec failed[-132]!
ata8: no sense translation for status: 0x00
ata8: translated ATA stat/err 0x00/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata8: status=0x00 { }
mvsas 0000:04:00.0: mvsas exec failed[-132]!
ata9: no sense translation for status: 0x00
ata9: translated ATA stat/err 0x00/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata9: status=0x00 { }
mvsas 0000:04:00.0: mvsas exec failed[-132]!
ata10: no sense translation for status: 0x00
ata10: translated ATA stat/err 0x00/00 to SCSI SK/ASC/ASCQ 0xb/00/00
ata10: status=0x00 { }

Theres 4 total disks connected, though ports 0 and 1 are currently empty, with 
ports 2, 3, 4, 5 populated with mostly identical 1TB Seagate disks (as 
identical as you can get, 1TB 7200.12 same firmware)

Trying to smartctl the disks only returns errors. hddtemp gives "drive is 
sleeping".

-- 
Thomas Fjellstrom
tfjellstrom@...w.ca
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ