[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <46C1717E.3030908@linpro.no>
Date: Tue, 14 Aug 2007 11:10:22 +0200
From: Tore Anderson <tore@...pro.no>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-scsi@...r.kernel.org, James.Smart@...lex.Com
Subject: Re: Linux 2.6.23-rc3
* Linus Torvalds
> Regardless of why, -rc3 is out, and doesn't have the tons of changes that
> -rc2 did. But there's some scheduler updates, sparc64 and powerpc changes,
> and random driver updates (the lpfc SCSI driver kind of stands out in the
> diffstat).
>
> Shortlog appended, I don't know what I can add to it.. Please do give it a
> good testing, unless you're on a beach sunning yourself (and who are we
> kidding: you're pasty white, and sand is hard to get out of the keyboard -
> beaches are overrated).
I gave it a spin, and got quite a few troubles that appears related to
the lpfc driver. I don't know if these problems happened due to the
recent update as the latest kernel I ran before was 2.6.20 (where I
never saw problems like these). First I see that something failed when
the volumes on the SAN (on a Sun StorageTek 6140 / Engenio 3994) is
registered, causing two errors errors such as:
[ 119.416641] lpfc 0000:08:01.0: 0:1303 Link Up Event x1 received Data: x1 xf7 x10 x0
[ 119.449879] scsi 3:0:0:0: Direct-Access SUN CSM200_R 0619 PQ: 0 ANSI: 5
[ 119.449888] kobject_add failed for 3:0:0:0 with -EEXIST, don't try to register things with the same name in the same directory.
[ 119.449979]
[ 119.449980] Call Trace:
[ 119.449990] [<ffffffff8032c232>] kobject_shadow_add+0x192/0x200
[ 119.449994] [<ffffffff8039c13c>] device_add+0xcc/0x620
[ 119.450011] [<ffffffff88027325>] :scsi_mod:scsi_sysfs_add_sdev+0x55/0x290
[ 119.450021] [<ffffffff88024dbf>] :scsi_mod:scsi_probe_and_add_lun+0x35f/0xc80
[ 119.450031] [<ffffffff8802489e>] :scsi_mod:scsi_alloc_target+0x22e/0x340
[ 119.450041] [<ffffffff880257fb>] :scsi_mod:__scsi_scan_target+0x11b/0x6d0
[ 119.450045] [<ffffffff8022f42a>] find_busiest_queue+0x7a/0xd0
[ 119.450055] [<ffffffff88026302>] :scsi_mod:scsi_scan_target+0xe2/0x120
[ 119.450064] [<ffffffff8810a723>] :scsi_transport_fc:fc_scsi_scan_rport+0x73/0xa0
[ 119.450068] [<ffffffff8810a6b0>] :scsi_transport_fc:fc_scsi_scan_rport+0x0/0xa0
[ 119.450071] [<ffffffff8024b29b>] run_workqueue+0x6b/0x120
[ 119.450073] [<ffffffff8024b580>] worker_thread+0x0/0x130
[ 119.450075] [<ffffffff8024b645>] worker_thread+0xc5/0x130
[ 119.450078] [<ffffffff8024f540>] autoremove_wake_function+0x0/0x30
[ 119.450080] [<ffffffff8024b580>] worker_thread+0x0/0x130
[ 119.450082] [<ffffffff8024b580>] worker_thread+0x0/0x130
[ 119.450084] [<ffffffff8024f2e6>] kthread+0x86/0x90
[ 119.450086] [<ffffffff8020cdb8>] child_rip+0xa/0x12
[ 119.450089] [<ffffffff8024f260>] kthread+0x0/0x90
[ 119.450091] [<ffffffff8020cdae>] child_rip+0x0/0x12
[ 119.450092]
[ 119.450093] error 1
This apparantly resulted in dm-multipath missing four paths it's
supposed to see (two paths to the standby controller x two volumes), so
I unmounted, flushed multipath maps, and removed the lpfc module. Then
I modprobed it again, which immediately resulted in the modprobe
process getting killed and the following showing up in the dmesg:
[ 194.724931] Emulex LightPulse Fibre Channel SCSI driver 8.2.2
[ 194.724941] Copyright(c) 2004-2007 Emulex. All rights reserved.
[ 194.725019] ACPI: PCI Interrupt 0000:08:01.0[A] -> GSI 25 (level, low) -> IRQ 25
[ 194.725921] scsi7 : on PCI bus 08 device 08 irq 25
[ 194.726038] Unable to handle kernel paging request at 0000000000030bb9 RIP:
[ 194.726092] [<0000000000030bb9>]
[ 194.726269] PGD 128557067 PUD 127753067 PMD 0
[ 194.726455] Oops: 0000 [1] SMP
[ 194.726596] CPU 0
[ 194.726693] Modules linked in: lpfc ocfs2 ipv6 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs bnx2 iTCO_wdt iTCO_vendor_support ata_generic shpchp pcspkr button pci_hotplug sr_mod cdrom sg evdev dm_round_robin dm_rdac dm_multipath ext3 jbd mbcache dm_mod raid1 md_mod ide_generic usb_storage ide_core libusual usbhid hid scsi_transport_fc ehci_hcd uhci_hcd usbcore ata_piix libata sd_mod mptsas mptscsih mptbase scsi_transport_sas scsi_mod thermal processor fan
[ 194.728904] Pid: 9330, comm: modprobe Not tainted 2.6.23-rc3 #1
[ 194.728969] RIP: 0010:[<0000000000030bb9>] [<0000000000030bb9>]
[ 194.729074] RSP: 0000:ffff810122351bc0 EFLAGS: 00010246
[ 194.729138] RAX: ffff81011ef16060 RBX: ffff81012314e060 RCX: ffffffff88155500
[ 194.729205] RDX: 0000000000000000 RSI: ffff810128155178 RDI: ffff81012314e060
[ 194.729272] RBP: 0000000000000030 R08: 0000000000000000 R09: 0000000000000000
[ 194.729339] R10: 0000000000000000 R11: ffff81012b584308 R12: ffff810128155000
[ 194.729406] R13: ffff810128155178 R14: ffffffff803a1ff0 R15: ffff8101281552d0
[ 194.729473] FS: 00002ab774d226d0(0000) GS:ffffffff80587000(0000) knlGS:0000000000000000
[ 194.729556] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 194.729621] CR2: 0000000000030bb9 CR3: 0000000123b9a000 CR4: 00000000000006e0
[ 194.729688] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 194.729755] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 194.729822] Process modprobe (pid: 9330, threadinfo ffff810122350000, task ffff8101228ca740)
[ 194.729905] Stack: ffffffff803a1d98 0000000000000000 ffff810128155178 0000000000000030
[ 194.730189] ffff810128155000 ffff8101281553d0 ffff8101281553d0 000000000000000f
[ 194.730431] ffffffff880278cb ffff810128155268 ffff810128155129 ffff810128155000
[ 194.730620] Call Trace:
[ 194.730738] [<ffffffff803a1d98>] attribute_container_add_device+0x68/0x160
[ 194.730820] [<ffffffff880278cb>] :scsi_mod:scsi_sysfs_add_host+0x11b/0x150
[ 194.730897] [<ffffffff8801d81f>] :scsi_mod:scsi_add_host+0xff/0x210
[ 194.730972] [<ffffffff88130d8a>] :lpfc:lpfc_create_port+0x15a/0x200
[ 194.731045] [<ffffffff88134200>] :lpfc:lpfc_pci_probe_one+0x740/0x9b0
[ 194.731116] [<ffffffff8033e7bd>] pci_device_probe+0xfd/0x180
[ 194.731182] [<ffffffff8039e650>] __driver_attach+0x0/0xb0
[ 194.731247] [<ffffffff8039e46c>] driver_probe_device+0x9c/0x1a0
[ 194.731313] [<ffffffff8039e650>] __driver_attach+0x0/0xb0
[ 194.731378] [<ffffffff8039e6f5>] __driver_attach+0xa5/0xb0
[ 194.731443] [<ffffffff8039d37d>] bus_for_each_dev+0x4d/0x80
[ 194.731509] [<ffffffff8039da61>] bus_add_driver+0xa1/0x1f0
[ 194.731575] [<ffffffff8033e226>] __pci_register_driver+0x66/0xb0
[ 194.731646] [<ffffffff880f5075>] :lpfc:lpfc_init+0x75/0x97
[ 194.731712] [<ffffffff8025b6bf>] sys_init_module+0x18f/0x1ab0
[ 194.731784] [<ffffffff8020bf9e>] system_call+0x7e/0x83
[ 194.731850]
[ 194.733465]
[ 194.733465] Code: Bad RIP value.
[ 194.733661] RIP [<0000000000030bb9>]
[ 194.733762] RSP <ffff810122351bc0>
[ 194.733823] CR2: 0000000000030bb9
At this point I saw no paths and the module was supposedly in use, so
I rebooted to recover - this time all paths came up correctly. So I
don't know if I can reproduce this reliably, but I'll be happy to try
to help out further if necessary.
I've attached the complete dmesg output gathered after the last crash.
An interesting thing to note is the complete lack of I/O errors,
usually the kernel log is flooded with these because the block devices
representing paths to the passive controller fails most I/O commands,
including things like the initial partition table read. So even though
there's only two of those -EEXIST messages, it was a total of four
paths that I think the block layer never saw.
Oh, and this is Ubuntu 6.06 (x86_64) running on an IBM xSeries HS21
blade.
Regards
--
Tore Anderson
View attachment "2.6.23-rc3.dmesg" of type "text/plain" (127190 bytes)
Powered by blists - more mailing lists