lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 15 Sep 2009 10:20:40 -0600
From:	Thomas Fjellstrom <tfjellstrom@...w.ca>
To:	linux-kernel@...r.kernel.org
Cc:	"linux-scsi" <linux-scsi@...r.kernel.org>,
	Linux IDE mailing list <linux-ide@...r.kernel.org>,
	Jeff Garzik <jeff@...zik.org>
Subject: Re: 2.6.31-rc9 kernel BUG and mvsas

On Sat September 12 2009, Thomas Fjellstrom wrote:
> On Sat September 12 2009, Jeff Garzik wrote:
> > On 09/10/2009 07:55 PM, Thomas Fjellstrom wrote:
> > > On Wed September 9 2009, Thomas Fjellstrom wrote:
> > >> On Wed September 9 2009, Thomas Fjellstrom wrote:
> > >>> On Wed September 9 2009, Jeff Garzik wrote:
> > >>>> On 09/09/2009 12:30 PM, Thomas Fjellstrom wrote:
> > >>>>> No errors on that disk. Other than the one above, and its more of a
> > >>>>> warning. However, I just rebooted to add some extra drives,
> > >>>>> thinking everything was working a little better now that I've
> > >>>>> updated to 2.6.31-rc9, I'm treated to the following two messages
> > >>>>> right after boot (and a system lockup to boot):
> > >>>>>
> > >>>>> kernel: [  971.033138] ------------[ cut here ]------------
> > >>>>> kernel: [  971.033211] WARNING: at drivers/ata/libata-core.c:4913
> > >>>>> __ata_qc_complete+0x5a/0xe1 [libata]()
> > >>>>> kernel: [  971.033217] Hardware name: GA-MA790FXT-UD5P
> > >>>>> kernel: [  971.033221] Modules linked in: powernow_k8
> > >>>>> cpufreq_conservative cpufreq_stats cpufreq_userspace
> > >>>>> cpufreq_powersave kvm_amd kvm nfsd exportfs nfs lockd fscache
> > >>>>> nfs_acl auth_rpcgss sunrpc bridge stp it87 hwmon_vid adt7473
> > >>>>> firewire_sbp2 loop md_mod
> > >>>>> snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep
> > >>>>> snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_midi snd_rawmidi
> > >>>>> snd_seq_midi_event snd_seq snd_timer snd_seq_device snd
> > >>>>> amd64_edac_mod edac_core i2c_piix4 soundcore snd_page_alloc
> > >>>>> i2c_core evdev wmi parport_pc button parport processor ext3 jbd
> > >>>>> mbcache dm_mod sg sr_mod cdrom sd_mod crc_t10dif usbhid ata_generic
> > >>>>> ide_pci_generic hid mvsas firewire_ohci libsas firewire_core
> > >>>>> crc_itu_t
> > >>>>> scsi_transport_sas r8169 atiixp ide_core floppy ahci mii ohci_hcd
> > >>>>> libata ehci_hcd scsi_mod thermal fan thermal_sys [last unloaded:
> > >>>>> scsi_wait_scan]
> > >>>>> kernel: [  971.033337] Pid: 0, comm: swapper Not tainted 2.6.31-rc9
> > >>>>> #2 kernel: [  971.033342] Call Trace:
> > >>>>> kernel: [  971.033346]<IRQ>    [<ffffffffa00562ca>] ?
> > >>>>> __ata_qc_complete+0x5a/0xe1 [libata]
> > >>>>> kernel: [  971.033434]  [<ffffffffa00562ca>] ?
> > >>>>> __ata_qc_complete+0x5a/0xe1 [libata]
> > >>>>> kernel: [  971.033446]  [<ffffffff8104aca0>] ?
> > >>>>> warn_slowpath_common+0x77/0xa3 kernel: [  971.033455]
> > >>>>> [<ffffffff81038d06>] ? enqueue_task+0x5c/0x65 kernel: [ 
> > >>>>> 971.033496] [<ffffffffa00562ca>] ? __ata_qc_complete+0x5a/0xe1
> > >>>>> [libata] kernel: [  971.033519]  [<ffffffffa00f7b59>] ?
> > >>>>> sas_ata_task_done+0x178/0x210 [libsas]
> > >>>>> kernel: [  971.033528]  [<ffffffff8115ead1>] ?
> > >>>>> blk_run_queue+0x21/0x35 kernel: [  971.033548] 
> > >>>>> [<ffffffffa010e2ce>] ?
> > >>>>> mvs_slot_complete+0x3df/0x41b [mvsas]
> > >>>>> kernel: [  971.033565]  [<ffffffffa010e39c>] ?
> > >>>>> mvs_int_rx+0x92/0x101 [mvsas] kernel: [  971.033583] 
> > >>>>> [<ffffffffa01112ba>] ?
> > >>>>> mvs_int_full+0x25/0x88 [mvsas] kernel: [  971.033600]
> > >>>>> [<ffffffffa011134e>] ? mvs_64xx_isr+0x31/0x40 [mvsas] kernel: [
> > >>>>> 971.033617]  [<ffffffffa010d0e5>] ? mvs_interrupt+0x61/0x78 [mvsas]
> > >>>>> kernel: [  971.033625]  [<ffffffff8108aaac>] ?
> > >>>>> handle_IRQ_event+0x58/0x135 kernel: [  971.033633]
> > >>>>> [<ffffffff8108c1a1>] ? handle_fasteoi_irq+0x7d/0xb5 kernel: [
> > >>>>> 971.033642]
> > >>>>> [<ffffffff8101388d>] ? handle_irq+0x17/0x1d
> > >>>>
> > >>>> That warning is triggered when an ata_queued_cmd is passed to
> > >>>> completion without the ATA_QCFLAG_ACTIVE flag being set (which
> > >>>> indicates the qc was started with some activity).
> > >>>>
> > >>>> That possibly indicates the low-level driver (or libsas) was passing
> > >>>> an already-completed cmd to libata.
> > >>>>
> > >>>>> The added hard drives are connected to a Supermicro AOC-SASLP-MV8,
> > >>>>> which is based on a marvel MV64460/64461/64462 chipset, which uses
> > >>>>> the sata_mv driver.
> > >>>>
> > >>>> Surely you mean 'mvsas' driver?
> > >>>
> > >>> Yes, sorry I did mean mvsas.
> > >>>
> > >>> I am more concerned about the actual oops/BUG rather than the warning
> > >>>   though. Unless the problem causing the warning is also causing the
> > >>> oops.
> > >>>
> > >>>> 	Jeff
> > >>
> > >> Thanks for taking a look so far. But I'm having more and more trouble
> > >> with this card as the days go by:
> > >>
> > >> [  464.792214] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  464.792222] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  494.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  494.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  494.816192] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  494.816197] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  525.817335] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  525.817343] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  525.817358] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  525.817363] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  556.816148] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  556.816157] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  556.816170] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  556.816175] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  587.816171] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  587.816179] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  587.816193] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  587.816199] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  600.616255] INFO: task mount:4395 blocked for more than 120
> > >> seconds. [  600.616263] "echo 0> 
> > >> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > >> [  600.616270] mount         D 0000000000000000     0  4395   4229
> > >>   0x00000000 [  600.616281]  ffff88012fb6f780 0000000000000082
> > >>   ffff8800b808ddc8 ffff8800b80f3d60
> > >> [  600.616290]  ffff880128923c90 0000000000014800 000000000000f800
> > >> ffff88012dd91840
> > >> [  600.616299]  ffff88012dd91b38 0000000300000008 0000000000000000
> > >> ffff88010c5bfa88
> > >> [  600.616308] Call Trace:
> > >> [  600.616324]  [<ffffffff81017015>] ? read_tsc+0xa/0x20
> > >> [  600.616335]  [<ffffffff810adc63>] ? __pagevec_free+0x29/0x3b
> > >> [  600.616343]  [<ffffffff810661e9>] ? getnstimeofday+0x55/0xaf
> > >> [  600.616351]  [<ffffffff810a8e69>] ? sync_page+0x0/0x46
> > >> [  600.616361]  [<ffffffff812dc8cb>] ? io_schedule+0x63/0xa5
> > >> [  600.616368]  [<ffffffff810a8eaa>] ? sync_page+0x41/0x46
> > >> [  600.616376]  [<ffffffff812dcade>] ? __wait_on_bit_lock+0x3f/0x84
> > >> [  600.616383]  [<ffffffff810a8e55>] ? __lock_page+0x5d/0x63
> > >> [  600.616391]  [<ffffffff8105ef14>] ? wake_bit_function+0x0/0x23
> > >> [  600.616401]  [<ffffffff810b0ad4>] ? pagevec_lookup+0x17/0x1e
> > >> [  600.616408]  [<ffffffff810b1bbd>] ?
> > >>   truncate_inode_pages_range+0x288/0x318 [  600.616418]
> > >>   [<ffffffff810fc8cb>] ? set_blocksize+0xc2/0xd2
> > >> [  600.616426]  [<ffffffff810fc8f2>] ? sb_set_blocksize+0x17/0x43
> > >> [  600.616477]  [<ffffffffa04aa8e0>] ? ext4_fill_super+0x1cc/0x2060
> > >> [ext4] [  600.616486]  [<ffffffff8117593b>] ? snprintf+0x44/0x4c
> > >> [  600.616493]  [<ffffffff810fb5b7>] ? check_disk_change+0x22/0x52
> > >> [  600.616501]  [<ffffffff812dd680>] ? __down_write_nested+0x15/0xab
> > >> [  600.616524]  [<ffffffff810dbfe6>] ? get_sb_bdev+0x111/0x159
> > >> [  600.616571]  [<ffffffffa04aa714>] ? ext4_fill_super+0x0/0x2060
> > >> [ext4] [  600.616582]  [<ffffffff810dbc44>] ?
> > >> vfs_kern_mount+0x95/0x111 [  600.616593]  [<ffffffff810dbd13>] ?
> > >> do_kern_mount+0x43/0xe2 [  600.616607]  [<ffffffff810ef49d>] ?
> > >> do_mount+0x767/0x7d6
> > >> [  600.616620]  [<ffffffff810ef591>] ? sys_mount+0x85/0xc8
> > >> [  600.616633]  [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> > >> [  618.816175] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  618.816184] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  618.816196] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  618.816201] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  618.816241] sd 1:0:3:0: [sdf] Unhandled error code
> > >> [  618.816247] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> > >> driverbyte=DRIVER_TIMEOUT
> > >> [  618.816255] end_request: I/O error, dev sdf, sector 160
> > >> [  618.816263] Buffer I/O error on device md2, logical block 20
> > >> [  618.816274] Buffer I/O error on device md2, logical block 21
> > >> [  618.816280] Buffer I/O error on device md2, logical block 22
> > >> [  618.816285] Buffer I/O error on device md2, logical block 23
> > >> [  618.816291] Buffer I/O error on device md2, logical block 24
> > >> [  618.816298] Buffer I/O error on device md2, logical block 25
> > >> [  618.816307] Buffer I/O error on device md2, logical block 26
> > >> [  618.816316] Buffer I/O error on device md2, logical block 27
> > >> [  618.816324] Buffer I/O error on device md2, logical block 28
> > >> [  618.816331] Buffer I/O error on device md2, logical block 29
> > >> [  649.780185] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  649.780194] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  649.780208] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  649.780214] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  649.780293] ------------[ cut here ]------------
> > >> [  649.780366] WARNING: at drivers/ata/libata-core.c:5129
> > >> ata_qc_issue+0x10a/0x347 [libata]()
> > >> [  649.780373] Hardware name: GA-MA790FXT-UD5P
> > >> [  649.780377] Modules linked in: ext4 jbd2 crc16 raid0 powernow_k8
> > >> cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave
> > >>   kvm_amd kvm nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss
> > >> sunrpc bridge stp it87 hwmon_vid adt7473 firewire_sbp2 loop md_mod
> > >> snd_hda_codec_realtek
> > >> snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss
> > >> snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event amd64_edac_mod
> > >> snd_seq edac_core snd_timer snd_seq_device snd soundcore i2c_piix4
> > >> snd_page_alloc i2c_core evdev parport_pc wmi parport button processor
> > >> ext3 jbd mbcache dm_mod usbhid hid sg sr_mod cdrom sd_mod crc_t10dif
> > >> ata_generic ide_pci_generic firewire_ohci firewire_core ohci_hcd
> > >> crc_itu_t atiixp ide_core mvsas ehci_hcd ahci libsas libata
> > >> scsi_transport_sas scsi_mod r8169 mii floppy thermal fan thermal_sys
> > >> [last unloaded: scsi_wait_scan] [  649.780499] Pid: 3185, comm:
> > >> hddtemp Not tainted 2.6.31-rc9 #2 [  649.780504] Call Trace:
> > >> [  649.780551]  [<ffffffffa0074f90>] ? ata_qc_issue+0x10a/0x347
> > >> [libata] [  649.780593]  [<ffffffffa0074f90>] ?
> > >> ata_qc_issue+0x10a/0x347 [libata] [  649.780607]  [<ffffffff8104aca0>]
> > >> ? warn_slowpath_common+0x77/0xa3 [  649.780649]  [<ffffffffa00790ce>]
> > >> ? ata_scsi_pass_thru+0x0/0x240 [libata] [  649.780690] 
> > >> [<ffffffffa0074f90>] ?
> > >> ata_qc_issue+0x10a/0x347 [libata] [  649.780733]  [<ffffffffa00401d6>]
> > >> ? scsi_get_command+0x75/0x97 [scsi_mod] [  649.780776]
> > >> [<ffffffffa00790ce>] ?
> > >>   ata_scsi_pass_thru+0x0/0x240 [libata] [  649.780815]
> > >> [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod] [  649.780858]
> > >> [<ffffffffa007a4d5>] ? __ata_scsi_queuecmd+0x185/0x1dc [libata]
> > >> [  649.780896]  [<ffffffffa003f7aa>] ? scsi_done+0x0/0xc [scsi_mod]
> > >> [  649.780918]  [<ffffffffa00a3c8e>] ? sas_queuecommand+0x83/0x25d
> > >> [libsas] [  649.780956]  [<ffffffffa003fa7c>] ?
> > >> scsi_dispatch_cmd+0x1c0/0x23c [scsi_mod]
> > >> [  649.780996]  [<ffffffffa0044ff0>] ? scsi_request_fn+0x3a5/0x506
> > >>   [scsi_mod] [  649.781006]  [<ffffffff810546e0>] ?
> > >> del_timer+0x59/0x62 [  649.781016]  [<ffffffff81163b70>] ?
> > >> blk_execute_rq_nowait+0x65/0x89 [  649.781032]  [<ffffffffa014164f>] ?
> > >> sg_common_write+0x489/0x4ab [sg] [  649.781042]  [<ffffffff8115df56>]
> > >> ? __freed_request+0x26/0x83 [  649.781056]  [<ffffffffa01421da>] ?
> > >> sg_new_write+0x23e/0x269 [sg] [  649.781070]  [<ffffffffa0142473>] ?
> > >> sg_ioctl+0x26e/0xb63 [sg] [  649.781080]  [<ffffffff81100f38>] ?
> > >> inotify_d_instantiate+0x12/0x39 [  649.781088]  [<ffffffff8105eee6>] ?
> > >> autoremove_wake_function+0x0/0x2e [  649.781098]  [<ffffffff810d80bf>]
> > >> ? fd_install+0x2e/0x5a
> > >> [  649.781105]  [<ffffffff810e5247>] ? vfs_ioctl+0x56/0x6c
> > >> [  649.781111]  [<ffffffff810e570a>] ? do_vfs_ioctl+0x437/0x475
> > >> [  649.781118]  [<ffffffff810e5799>] ? sys_ioctl+0x51/0x70
> > >> [  649.781128]  [<ffffffff81010a02>] ? system_call_fastpath+0x16/0x1b
> > >> [  649.781134] ---[ end trace 9005373b1b9c6eb7 ]---
> > >> [  680.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  680.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  711.781672] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  711.781681] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  741.816069] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  741.816078] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  741.816090] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  741.816096] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  772.820160] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  772.820168] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  772.820181] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  772.820186] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  803.816212] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  803.816220] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  803.816233] drivers/scsi/mvsas/mv_sas.c 1669:mvs_abort_task:rc= 5
> > >> [  803.816239] drivers/scsi/mvsas/mv_sas.c 1608:mvs_query_task:rc= 5
> > >> [  803.816264] sd 1:0:3:0: [sdf] Unhandled error code
> > >> [  803.816270] sd 1:0:3:0: [sdf] Result: hostbyte=DID_OK
> > >> driverbyte=DRIVER_TIMEOUT
> > >> [  803.816278] end_request: I/O error, dev sdf, sector 512
> > >> [  803.816284] __ratelimit: 70 callbacks suppressed
> > >> [  803.816290] Buffer I/O error on device md2, logical block 64
> > >>
> > >> That's after bringing up a raid0 array I build a few days ago on 4
> > >>   perfectly good (Seagate Baracuda 1TB 7200.12) disks, without the bad
> > >> disk plugged in. I try to mount it, and the driver hangs. Anything
> > >> trying to access any of the 4 disks hangs as well.
> > >>
> > >> I know this array worked a few days ago. The most major change I've
> > >> made was upgrade from -rc8 (or -rc5, not sure if I mounted the array
> > >> under -rc8) to -rc9.
> > >
> > > Hi, just wondering if anyone has had a chance to look at this, or if
> > > there's some patches I should try out, or if you need me to do some
> > > testing, I'd be glad to, thanks :)
> >
> > I was hoping that some VM would jump in.  The BUG in question is, from
> > mm/slab.c:
> >
> >                  /*
> >                   * The slab was either on partial or free list so
> >                   * there must be at least one object available for
> >                   * allocation.
> >                   */
> >                  BUG_ON(slabp->inuse >= cachep->num);
> >
> > So I wonder if that is a double-free, indicating a bug in
> > SCSI/libsas/mvsas, or a VM problem of some sort.
> >
> > Was free memory low on that machine, at that point, perchance?
> 
> Its remotely possible do to leaks in programs, but even konqueror and
>  firefox don't eat up memory _that_ fast, so its very unlikely. This
>  machine has 4G ram.
> 
> > 	Jeff
> >
> >

Ok, I've managed to figure out one of my problems, the WD drive[s] don't much 
like hddtemp and smartmon running all the time. I disabled them and ran a long 
read test on the drives on a different controller, and they work fine.

Things however are not so rosy, trying to do anything with the Marvell based 
SAS card causes errors.

I've attached the dmesg log from a fresh boot of 2.6.31-git4 (with the edac 
amd patch applied).

I suspect somehow the card isn't initializing things right, as the "Activity" 
lights on my hotswap bays are on for the two Seagate Baracuda drives, not on 
full, but on none the less.

-- 
Thomas Fjellstrom
tfjellstrom@...w.ca

View attachment "mvsas.dmesg" of type "text/plain" (77254 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ