linux-kernel - Re: Kernel oops in 2.6.18.3 with RAID5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 21 Feb 2007 08:45:22 -0700
From:	"Andrew Robinson" <andrew.rw.robinson@...il.com>
To:	linux-kernel@...r.kernel.org
Subject: Re: Kernel oops in 2.6.18.3 with RAID5

Update: I think that you can ignore this error. I am getting
segmentation faults when I attempt to rebuild the kernel. This is
exactly the same problem I had with slackware 10.1 with the 2.6.10
kernel. So I think it is a hardware issue. Memtest86 didn't show any
errors after 35 passes, so I'll have to check the CPU and motherboard.

Thanks for anyone who spent any time thinking/researching this.

On 2/20/07, Andrew Robinson <andrew.rw.robinson@...il.com> wrote:
> Here is the full dmesg log of the crash:
>
> iret exception: 0000 [#1]
> SMP
> Modules linked in: ppdev lp button ac battery ipv6 dm_snapshot
> dm_mirror dm_mod loop tsdev rtc psmouse parport_pc parport floppy
> serio_raw pcspkr i2c_nforce2 snd_intel8x0 snd_ac97_codec snd_
> ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc shpchp
> pci_hotplug i2c_core nvidia_agp agpgart evdev ext3 jbd mbcache raid456
> md_mod xor ide_cd cdrom ide_disk sd_mod generic 8139too
> amd74xx ide_core sata_sil 8139cp mii sata_nv libata scsi_mod forcedeth
> ehci_hcd ohci_hcd usbcore thermal processor fan
> CPU:    0
> EIP:    0060:[<de95b0cb>]    Not tainted VLI
> EFLAGS: 00000216   (2.6.18-3-686 #1)
> EIP is at copy_data+0xff/0x14b [raid456]
> eax: ddcce000   ebx: 00001000   ecx: 0000000f   edx: c1f71000
> esi: ddccefc4   edi: c1f71fc4   ebp: 00000000   esp: dd261e4c
> ds: 007b   es: 007b   ss: 0068
> Process md0_raid5 (pid: 1115, ti=dd260000 task=dd0ed550 task.ti=dd260000)
> Stack: c1f71000 ddb55460 c1e377a0 00000000 ddcce000 00001000 c1f71000 00000000
>        00000000 00000000 00000000 dd20c388 c1e377a0 dd20c354 de95d96d 0c649510
>        00000000 c0116d0a 06323c4f 00000000 dd20c384 00000000 c13c52e0 0000e000
> Call Trace:
>  [<de95d96d>] handle_stripe+0x10da/0x2075 [raid456]
>  [<c0116d0a>] find_busiest_group+0x177/0x46a
>  [<c011669e>] __wake_up+0x2a/0x3d
>  [<de95a72c>] __release_stripe+0x10c/0x110 [raid456]
>  [<de95a751>] release_stripe+0x21/0x2e [raid456]
>  [<de95ea15>] raid5d+0x10d/0x132 [raid456]
>  [<de92d769>] md_thread+0xd7/0xed [md_mod]
>  [<c012d961>] autoremove_wake_function+0x0/0x2d
>  [<de92d692>] md_thread+0x0/0xed [md_mod]
>  [<c012d893>] kthread+0xc2/0xef
>  [<c012d7d1>] kthread+0x0/0xef
>  [<c0101005>] kernel_thread_helper+0x5/0xb
> Code: 8d 04 2f 01 4c 24 18 83 7c 24 0c 00 8b 54 24 18 8d 34 32 89 34
> 24 74 09 89 d9 89 c7 c1 e9 02 eb 0a 8b 3c 24 89 d9 89 c6 c1 e9 02 <f3>
> a5 89 d9 83 e1 03 74 02 f3 a4 8b 44 24 18 ba 03 00
>  00 00 e8
> EIP: [<de95b0cb>] copy_data+0xff/0x14b [raid456] SS:ESP 0068:dd261e4c
>  <6>note: md0_raid5[1115] exited with preempt_count 1
>
> I was having instability with this machine before (slackware 10.1 with
> 2.6.10 kernel) while compiling code (especially the kernel). I just
> rebuilt is as a debian box. It never died in the raid array code
> before though, just in gcc.
>
> I have tested the machine's ram with memtest86 (3 passes) and will
> more thoroughly check it tonight. Besides bad RAM, does anyone have
> any other ideas on what may be causing the issue?
>
>
> On 2/20/07, Andrew Robinson wrote:
> > I can't seem to find sufficient information on what may have caused an
> > oops. I am running a debian machine using kernel 2.6.18.3. Here is
> > detailed information on the system:
> >
> > debian etch
> > CPU: AMD athlon 2100+
> > kernel package: linux-image-2.6.18-3-686
> > raid5 array: 3 active, 1 spare on md0
> > raid fs: ext3
> > raid is physically across 2 on-board NVidia SATA ports and 2 ports
> > from a SATA controller card
> >
> > I am at work, and this was a home computer. This is what I got from
> > syslog when in SSH before it died:
> >
> > bserver kernel: iret exception: 0000 [#1]
> > bserver kernel: SMP
> > bserver kernel: CPU:    0
> > bserver kernel: EIP is at copy_data+0xff/0x14b [raid456]
> > bserver kernel: eax: ddcce000   ebx: 00001000   ecx: 0000000f   edx: c1f71000
> > bserver kernel: esi: ddccefc4   edi: c1f71fc4   ebp: 00000000   esp: dd261e4c
> > bserver kernel: ds: 007b   es: 007b   ss: 0068
> > bserver kernel: Process md0_raid5 (pid: 1115, ti=dd260000
> > task=dd0ed550 task.ti=dd260000)
> > bserver kernel: Stack: c1f71000 ddb55460 c1e377a0 00000000 ddcce000
> > 00001000 c1f71000 00000000
> > bserver kernel:        00000000 00000000 00000000 dd20c388 c1e377a0
> > dd20c354 de95d96d 0c649510
> > bserver kernel:        00000000 c0116d0a 06323c4f 00000000 dd20c384
> > 00000000 c13c52e0 0000e000
> > bserver kernel: Call Trace:
> > bserver kernel: Code: 8d 04 2f 01 4c 24 18 83 7c 24 0c 00 8b 54 24 18
> > 8d 34 32 89 34 24 74 09 89 d9 89 c7 c1 e9 02 eb 0a 8b 3c 24 89 d9 89
> > c6 c1 e9 02 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 44 24 18 ba 03 00 00
> > 00 e8
> > bserver kernel: EIP: [<de95b0cb>] copy_data+0xff/0x14b [raid456]
> > SS:ESP 0068:dd261e4c
> >
> > The only similar message chain that I could find was about 2.6.19 and
> > they recommended disabling preempting, but debian's 2.6.18.3 already
> > has that disabled by default.
> >
> > Any ideas?
> >
> > Thanks,
> > Andrew
> >
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/