lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <474E82C8.8090207@linux.vnet.ibm.com>
Date:	Thu, 29 Nov 2007 14:43:44 +0530
From:	Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
CC:	LKML <linux-kernel@...r.kernel.org>, linuxppc-dev@...abs.org,
	Andy Whitcroft <apw@...dowen.org>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	linux-scsi@...r.kernel.org, "Rafael J. Wysocki" <rjw@...k.pl>,
	Matthew Wilcox <willy@...ian.org>
Subject: Re: [BUG] 2.6.24-rc3-git2 softlockup detected

Andrew Morton wrote:
> On Thu, 29 Nov 2007 12:01:08 +0530 Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com> wrote:
> 
>> Andrew Morton wrote:
>>> On Wed, 28 Nov 2007 12:47:19 +0530 Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com> wrote:
>>>
>>>> Andrew Morton wrote:
>>>>> On Wed, 28 Nov 2007 11:59:00 +0530 Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com> wrote:
>>>>>
>>>>>> Hi,
>>>>> (cc linux-scsi, for sym53c8xx)
>>>>>
>>>>>> Soft lockup is detected while bootup with 2.6.24-rc3-git2 on powerbox
>>>>> I assume this is a post-2.6.23 regression?
>>>>>
>>>>>> BUG: soft lockup - CPU#1 stuck for 11s! [insmod:375]
>>>>>> NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018
>>>>>> REGS: c00000077cbef0b0 TRAP: 0901   Not tainted  (2.6.24-rc3-git2-autotest)
>>>>>> MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 24022088  XER: 00000000
>>>>>> TASK = c00000077cbd8000[375] 'insmod' THREAD: c00000077cbec000 CPU: 1
>>>>>> GPR00: d0000000001414fc c00000077cbef330 c00000000052b930 d000080080002014 
>>>>>> GPR04: d00008008000202c 0000000000000000 c00000077ca1cb00 d00000000014ce54 
>>>>>> GPR08: c00000077ca1c63c 0000000000000000 000000000000002a c00000000002f018 
>>>>>> GPR12: d000000000143610 c000000000473d00 
>>>>>> NIP [c00000000002f02c] .ioread8+0x14/0x60
>>>>>> LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>>>>>> Call Trace:
>>>>>> [c00000077cbef330] [c00000077cbef3c0] 0xc00000077cbef3c0 (unreliable)
>>>>>> [c00000077cbef3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>>>>>> [c00000077cbef470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
>>>>>> [c00000077cbef710] [c0000000001bc118] .pci_device_probe+0x124/0x1b0
>>>>>> [c00000077cbef7b0] [c000000000221138] .driver_probe_device+0x144/0x20c
>>>>>> [c00000077cbef850] [c000000000221450] .__driver_attach+0xcc/0x154
>>>>>> [c00000077cbef8e0] [c00000000021ff94] .bus_for_each_dev+0x7c/0xd4
>>>>>> [c00000077cbef9a0] [c000000000220e9c] .driver_attach+0x28/0x40
>>>>>> [c00000077cbefa20] [c0000000002204d8] .bus_add_driver+0x90/0x228
>>>>>> [c00000077cbefac0] [c000000000221858] .driver_register+0x94/0xb0
>>>>>> [c00000077cbefb40] [c0000000001bc430] .__pci_register_driver+0x6c/0xcc
>>>>>> [c00000077cbefbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
>>>>>> [c00000077cbefc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958
>>>>>> [c00000077cbefe30] [c00000000000872c] syscall_exit+0x0/0x40
>>>>>> Instruction dump:
>>>>>> 60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6 
>>>>>> f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff 
>>>>> I see no obvious lockup sites near the end of sym_hcb_attach().  Maybe it's
>>>>> being called lots of times from a higher level..  Do the traces all look
>>>>> the same?
>>>> Hi Andrew,
>>>>
>>>> I see this call trace twice and both looks similar and on another reboot
>>>> the following trace is seen twice in different cpu
>>>>
>>>> BUG: soft lockup detected on CPU#3!
>>>> Call Trace:
>>>> [C00000003FEDEDA0] [C000000000010220] .show_stack+0x68/0x1b0 (unreliable)
>>>> [C00000003FEDEE40] [C0000000000A061C] .softlockup_tick+0xf0/0x13c
>>>> [C00000003FEDEEF0] [C000000000072E2C] .run_local_timers+0x1c/0x30
>>>> [C00000003FEDEF70] [C000000000022FA0] .timer_interrupt+0xa8/0x488
>>>> [C00000003FEDF050] [C0000000000034EC] decrementer_common+0xec/0x100
>>>> --- Exception: 901 at .ioread8+0x14/0x60
>>>>     LR = .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
>>>> [C00000003FEDF340] [D0000000002B3BC0] 0xd0000000002b3bc0 (unreliable)
>>>> [C00000003FEDF3B0] [D00000000029A3C0] .sym_hcb_attach+0x1194/0x1384 [sym53c8xx]
>>>> [C00000003FEDF480] [D000000000291D30] .sym2_probe+0x75c/0x9f8 [sym53c8xx]
>>>> [C00000003FEDF710] [C0000000001B65A4] .pci_device_probe+0x13c/0x1dc
>>>> [C00000003FEDF7D0] [C000000000219A0C] .driver_probe_device+0xa0/0x15c
>>>> [C00000003FEDF870] [C000000000219C64] .__driver_attach+0xb4/0x138
>>>> [C00000003FEDF900] [C00000000021913C] .bus_for_each_dev+0x7c/0xd4
>>>> [C00000003FEDF9C0] [C0000000002198B0] .driver_attach+0x28/0x40
>>>> [C00000003FEDFA40] [C000000000218BA4] .bus_add_driver+0x98/0x18c
>>>> [C00000003FEDFAE0] [C00000000021A064] .driver_register+0xa8/0xc4
>>>> [C00000003FEDFB60] [C0000000001B68AC] .__pci_register_driver+0x5c/0xa4
>>>> [C00000003FEDFBF0] [D00000000029C204] .sym2_init+0x104/0x1550 [sym53c8xx]
>>>> [C00000003FEDFC90] [C00000000008D1F4] .sys_init_module+0x1764/0x1998
>>>> [C00000003FEDFE30] [C00000000000869C] syscall_exit+0x0/0x40
>>>>
>>> hm, odd.
>>>
>>> Can you look up sym_hcb_attach+0x1194/0x1384 in gdb?  Something like
>>>
>> Hi Andrew,
>>
>> I tried with 2.6.24-rc3-git3 and got the following trace
>>
>> BUG: soft lockup - CPU#2 stuck for 11s! [insmod:375]
>> NIP: c00000000002f02c LR: d0000000001414fc CTR: c00000000002f018
>> REGS: c00000077ca3b0b0 TRAP: 0901   Not tainted  (2.6.24-rc3-git3-autokern1)
>> MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 24022088  XER: 00000000
>> TASK = c00000077cc58000[375] 'insmod' THREAD: c00000077ca38000 CPU: 2
>> GPR00: d0000000001414fc c00000077ca3b330 c00000000052b880 d000080080002014 
>> GPR04: d00008008000202c 0000000000000000 c00000077c82eb00 d00000000014ce54 
>> GPR08: c00000077c82e63c 0000000000000000 000000000000002a c00000000002f018 
>> GPR12: d000000000143610 c000000000473f80 
>> NIP [c00000000002f02c] .ioread8+0x14/0x60
>> LR [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>>
>> Call Trace:
>> [c00000077ca3b330] [c00000077ca3b3c0] 0xc00000077ca3b3c0 (unreliable)
>> [c00000077ca3b3a0] [d0000000001414fc] .sym_hcb_attach+0x1188/0x1378 [sym53c8xx]
>> [c00000077ca3b470] [d0000000001395f8] .sym2_probe+0x700/0x99c [sym53c8xx]
>> [c00000077ca3b710] [c0000000001bc098] .pci_device_probe+0x124/0x1b0
>> [c00000077ca3b7b0] [c0000000002210c4] .driver_probe_device+0x144/0x20c
>> [c00000077ca3b850] [c0000000002213dc] .__driver_attach+0xcc/0x154
>> [c00000077ca3b8e0] [c00000000021ff20] .bus_for_each_dev+0x7c/0xd4
>> [c00000077ca3b9a0] [c000000000220e28] .driver_attach+0x28/0x40
>> [c00000077ca3ba20] [c000000000220464] .bus_add_driver+0x90/0x228
>> [c00000077ca3bac0] [c0000000002217e4] .driver_register+0x94/0xb0
>> [c00000077ca3bb40] [c0000000001bc3b0] .__pci_register_driver+0x6c/0xcc
>> [c00000077ca3bbe0] [d000000000143428] .sym2_init+0x108/0x15b0 [sym53c8xx]
>> [c00000077ca3bc80] [c00000000008ce80] .sys_init_module+0x17c4/0x1958
>> [c00000077ca3be30] [c00000000000872c] syscall_exit+0x0/0x40
>>
>> Instruction dump:
>> 60000000 786b0420 38210070 7d635b78 e8010010 7c0803a6 4e800020 7c0802a6
>> f8010010 f821ff91 7c0004ac 89230000 <0c090000> 4c00012c 79290620 2f8900ff
>>
>> The gdb list the following for the above trace
>>
>> 0xa4fc is in sym_hcb_attach (drivers/scsi/sym53c8xx_2/sym_hipd.c:1041).
>> 1036            OUTL_DSP(np, pc);
>> 1037            /*
>> 1038             *  Wait 'til done (with timeout)
>> 1039             */
>> 1040            for (i=0; i<SYM_SNOOP_TIMEOUT; i++)
>> 1041                    if (INB(np, nc_istat) & (INTF|SIP|DIP))
>> 1042                            break;
>> 1043            if (i>=SYM_SNOOP_TIMEOUT) {
>> 1044                    printf ("CACHE TEST FAILED: timeout.\n");
>> 1045                    return (0x20);
>>
> 
> doh, I missed that.
> 
> #define SYM_SNOOP_TIMEOUT (10000000)
> 
> ten million is close enough to infinity for me to assume that we broke the
> driver and that's never going to terminate.
> 
> otoh, if that's true you should have got the "CACHE TEST FAILED: timeout"
> message.  Did you?  And does the driver actually work OK after this?

Yes, i got that message

CACHE TEST FAILED: timeout.
sym1: CACHE INCORRECTLY CONFIGURED.
sym1: giving up ...

and the file system stress passes.
 
> If it is indeed expected that a ~10 second stall in there is correct
> behaviour then all we need to do is do make that loop a bit smarter (10,000
> msleep(1)'s, for example).
> 


-- 
Thanks & Regards,
Kamalesh Babulal,
Linux Technology Center,
IBM, ISTL.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ