linux-kernel - Re: Oops while running fs_racer test on a POWER6 box against latest git

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20100709083506.GB16931@laptop>
Date:	Fri, 9 Jul 2010 18:35:06 +1000
From:	Nick Piggin <npiggin@...e.de>
To:	Jens Axboe <jaxboe@...ionio.com>
Cc:	divya <dipraksh@...ux.vnet.ibm.com>,
	"maciej.rutecki@...il.com" <maciej.rutecki@...il.com>,
	"linuxppc-dev@...abs.org" <linuxppc-dev@...abs.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Ron Minnich <rminnich@...dia.gov>,
	Latchesar Ionkov <lucho@...kov.net>, "hch@....de" <hch@....de>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Nick Piggin <npiggin@...e.de>
Subject: Re: Oops while running fs_racer test on a POWER6 box against
 latest git

On Fri, Jul 09, 2010 at 09:34:16AM +0200, Jens Axboe wrote:
> On 2010-07-09 08:57, divya wrote:
> > On Friday 02 July 2010 12:16 PM, divya wrote:
> >> On Thursday 01 July 2010 11:55 PM, Maciej Rutecki wrote:
> >>> On środa, 30 czerwca 2010 o 13:22:27 divya wrote:
> >>>> While running fs_racer test from LTP on a POWER6 box against latest
> >>>> git(2.6.35-rc3-git4 - commitid 984bc9601f64fd) came across the 
> >>>> following
> >>>> warning followed by multiple oops.
> >>>>
> >>> I created a Bugzilla entry at
> >>> https://bugzilla.kernel.org/show_bug.cgi?id=16324
> >>> for your bug report, please add your address to the CC list in there, 
> >>> thanks!
> >>>
> >>>
> >> Here I find a cleaner back trace while running fs_racer test from LTP 
> >> on a POWER6
> >> box against the latest git(2.6.35-rc3-git5 - commitid 980019d74e4b242)
> >>
> >> Badness at kernel/mutex-debug.c:64
> >> BUG: key (null) not in .data!
> >> NIP: c0000000000be9e8 LR: c0000000000be9cc CTR: 0000000000000000
> >> REGS: c00000010bb176f0 TRAP: 0700   Not tainted  
> >> (2.6.35-rc3-git5-autotest)
> >> BUG: key 00000000000001d8 not in .data!
> >> BUG: key 00000000000001e0 not in .data!
> >> BUG: key 00000000000001e8 not in .data!
> >> MSR: 8000000000029032
> >> Unable to handle kernel paging request for data at address 0x00000028
> >> Faulting instruction address: 0xc0000000003ad0ec
> >> Oops: Kernel access of bad area, sig: 11 [#1]
> >> SMP NR_CPUS=1024 NUMA pSeries
> >> last sysfs file: 
> >> /sys/devices/system/cpu/cpu19/cache/index1/shared_cpu_map
> >> Page fault in user mode with in_atomic() = 1 mm = c00000010943e600
> >> Modules linked in:
> >> NIP = fff9e98fc40  MSR = 800000004001d032
> >>  ipv6 fuse loop
> >> Unable to handle kernel paging request for unknown fault
> >>  dm_mod
> >> Faulting instruction address: 0xc00000000008d0f4
> >>  sr_mod ibmveth cdrom sg sd_mod crc_t10dif ibmvscsic 
> >> scsi_transport_srp scsi_tgt scsi_mod
> >> NIP: c0000000003ad0ec LR: c00000000064c3b0 CTR: c0000000003a6eb0
> >> REGS: c000000109b4f610 TRAP: 0300   Not tainted  
> >> (2.6.35-rc3-git5-autotest)
> >> MSR: 8000000000009032<EE,ME,IR,DR>   CR: 88004484  XER: 00000001
> >> DAR: 0000000000000028, DSISR: 0000000040010000
> >> TASK = c000000109a98600[7403] 'mkdir' THREAD: c000000109b4c000 CPU: 19
> >> GPR00: 0000000080000013 c000000109b4f890 c000000000d3d798 
> >> 0000000000000028
> >> GPR04: 0000000000000000 0000000000000000 0000000000000000 
> >> 0000000000000001
> >> GPR08: 0000000000000000 0000000000000028 c000000000189f2c 
 >> c000000109a98600
> >> GPR12: 0000000024004424 c00000000f602f80 00000000000041ff 
> >> 0000000000000001
> >> GPR16: 0000000000000002 c00000010d8304c0 c000000109b4fb44 
> >> 0000000000000000
> >> GPR20: c00000010df77908 fffffffffffff000 0000000000010000 
> >> 00000000000041ff
> >> GPR24: c00000010df77758 c000000109fa1800 c00000010df77908 
> >> c0000000ff236600
> >> GPR28: 0000000000000028 0000000000000040 c000000000ca7b38 
> >> c000000000189f2c
> >> NIP [c0000000003ad0ec] .do_raw_spin_trylock+0x10/0x48
> >> LR [c00000000064c3b0] ._raw_spin_lock+0x50/0xa4
> >> Call Trace:
> >> [c000000109b4f890] [c00000000064c3a4] ._raw_spin_lock+0x44/0xa4 
> >> (unreliable)
> >> [c000000109b4f920] [c000000000189f2c] .new_inode+0x4c/0xe4
> >> [c000000109b4f9b0] [c0000000002257fc] .ext3_new_inode+0x84/0xb70
> >> [c000000109b4fad0] [c00000000022f1ec] .ext3_mkdir+0x130/0x438
> >> [c000000109b4fbe0] [c00000000017adb4] .vfs_mkdir+0xb8/0x160
> >> [c000000109b4fc80] [c00000000017e52c] .SyS_mkdirat+0xb0/0x114
> >> [c000000109b4fdc0] [c00000000017a730] .SyS_mkdir+0x1c/0x30
> >> [c000000109b4fe30] [c0000000000085b4] syscall_exit+0x0/0x40
> >> Instruction dump:
> >> eb41ffd0 7c0803a6 eb61ffd8 eb81ffe0 eba1ffe8 ebc1fff0 ebe1fff8 4e800020
> >> 38000000 7c691b78 980d0214 800d0008<7d601829>  2c0b0000 40c20010 7c00192d
> >> Oops: Weird page fault, sig: 11 [#2]
> >>
> >> Pls let me know if this back trace would help in analyzing further.
> >> Meanwhile I shall do a git bisect and send the inputs.

The call stack for Badness at kernel/mutex-debug.c:64 (or whatever
explodes first) would be handy.  This one seems jumbled still. What
spinlock is in the trace? inode_lock?  That would indicate some random
corruption or breakage in the lock debugging.

> >>
> >> Thanks
> >> Divya
> >>
> >>
> >>
> > Hi All,
> > 
> >  From the git bisect,seems like the commit
> >  57439f878afafefad8836ebf5c49da2a0a746105 is the corrupt for the above
> >  issue.

Call me blind but I can't see the problem. Are you sure this commit
breaks it?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/