linux-kernel - Re: Kernel oops in 2.6.18.3 with RAID5

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bc36a6210702201714nb18925ua8b89cd2c2ecad1@mail.gmail.com>
Date:	Tue, 20 Feb 2007 18:14:42 -0700
From:	"Andrew Robinson" <andrew.rw.robinson@...il.com>
To:	linux-kernel@...r.kernel.org
Subject: Re: Kernel oops in 2.6.18.3 with RAID5

Here is the full dmesg log of the crash:

iret exception: 0000 [#1]
SMP
Modules linked in: ppdev lp button ac battery ipv6 dm_snapshot
dm_mirror dm_mod loop tsdev rtc psmouse parport_pc parport floppy
serio_raw pcspkr i2c_nforce2 snd_intel8x0 snd_ac97_codec snd_
ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc shpchp
pci_hotplug i2c_core nvidia_agp agpgart evdev ext3 jbd mbcache raid456
md_mod xor ide_cd cdrom ide_disk sd_mod generic 8139too
amd74xx ide_core sata_sil 8139cp mii sata_nv libata scsi_mod forcedeth
ehci_hcd ohci_hcd usbcore thermal processor fan
CPU:    0
EIP:    0060:[<de95b0cb>]    Not tainted VLI
EFLAGS: 00000216   (2.6.18-3-686 #1)
EIP is at copy_data+0xff/0x14b [raid456]
eax: ddcce000   ebx: 00001000   ecx: 0000000f   edx: c1f71000
esi: ddccefc4   edi: c1f71fc4   ebp: 00000000   esp: dd261e4c
ds: 007b   es: 007b   ss: 0068
Process md0_raid5 (pid: 1115, ti=dd260000 task=dd0ed550 task.ti=dd260000)
Stack: c1f71000 ddb55460 c1e377a0 00000000 ddcce000 00001000 c1f71000 00000000
       00000000 00000000 00000000 dd20c388 c1e377a0 dd20c354 de95d96d 0c649510
       00000000 c0116d0a 06323c4f 00000000 dd20c384 00000000 c13c52e0 0000e000
Call Trace:
 [<de95d96d>] handle_stripe+0x10da/0x2075 [raid456]
 [<c0116d0a>] find_busiest_group+0x177/0x46a
 [<c011669e>] __wake_up+0x2a/0x3d
 [<de95a72c>] __release_stripe+0x10c/0x110 [raid456]
 [<de95a751>] release_stripe+0x21/0x2e [raid456]
 [<de95ea15>] raid5d+0x10d/0x132 [raid456]
 [<de92d769>] md_thread+0xd7/0xed [md_mod]
 [<c012d961>] autoremove_wake_function+0x0/0x2d
 [<de92d692>] md_thread+0x0/0xed [md_mod]
 [<c012d893>] kthread+0xc2/0xef
 [<c012d7d1>] kthread+0x0/0xef
 [<c0101005>] kernel_thread_helper+0x5/0xb
Code: 8d 04 2f 01 4c 24 18 83 7c 24 0c 00 8b 54 24 18 8d 34 32 89 34
24 74 09 89 d9 89 c7 c1 e9 02 eb 0a 8b 3c 24 89 d9 89 c6 c1 e9 02 <f3>
a5 89 d9 83 e1 03 74 02 f3 a4 8b 44 24 18 ba 03 00
 00 00 e8
EIP: [<de95b0cb>] copy_data+0xff/0x14b [raid456] SS:ESP 0068:dd261e4c
 <6>note: md0_raid5[1115] exited with preempt_count 1

I was having instability with this machine before (slackware 10.1 with
2.6.10 kernel) while compiling code (especially the kernel). I just
rebuilt is as a debian box. It never died in the raid array code
before though, just in gcc.

I have tested the machine's ram with memtest86 (3 passes) and will
more thoroughly check it tonight. Besides bad RAM, does anyone have
any other ideas on what may be causing the issue?


On 2/20/07, Andrew Robinson wrote:
> I can't seem to find sufficient information on what may have caused an
> oops. I am running a debian machine using kernel 2.6.18.3. Here is
> detailed information on the system:
>
> debian etch
> CPU: AMD athlon 2100+
> kernel package: linux-image-2.6.18-3-686
> raid5 array: 3 active, 1 spare on md0
> raid fs: ext3
> raid is physically across 2 on-board NVidia SATA ports and 2 ports
> from a SATA controller card
>
> I am at work, and this was a home computer. This is what I got from
> syslog when in SSH before it died:
>
> bserver kernel: iret exception: 0000 [#1]
> bserver kernel: SMP
> bserver kernel: CPU:    0
> bserver kernel: EIP is at copy_data+0xff/0x14b [raid456]
> bserver kernel: eax: ddcce000   ebx: 00001000   ecx: 0000000f   edx: c1f71000
> bserver kernel: esi: ddccefc4   edi: c1f71fc4   ebp: 00000000   esp: dd261e4c
> bserver kernel: ds: 007b   es: 007b   ss: 0068
> bserver kernel: Process md0_raid5 (pid: 1115, ti=dd260000
> task=dd0ed550 task.ti=dd260000)
> bserver kernel: Stack: c1f71000 ddb55460 c1e377a0 00000000 ddcce000
> 00001000 c1f71000 00000000
> bserver kernel:        00000000 00000000 00000000 dd20c388 c1e377a0
> dd20c354 de95d96d 0c649510
> bserver kernel:        00000000 c0116d0a 06323c4f 00000000 dd20c384
> 00000000 c13c52e0 0000e000
> bserver kernel: Call Trace:
> bserver kernel: Code: 8d 04 2f 01 4c 24 18 83 7c 24 0c 00 8b 54 24 18
> 8d 34 32 89 34 24 74 09 89 d9 89 c7 c1 e9 02 eb 0a 8b 3c 24 89 d9 89
> c6 c1 e9 02 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 44 24 18 ba 03 00 00
> 00 e8
> bserver kernel: EIP: [<de95b0cb>] copy_data+0xff/0x14b [raid456]
> SS:ESP 0068:dd261e4c
>
> The only similar message chain that I could find was about 2.6.19 and
> they recommended disabling preempting, but debian's 2.6.18.3 already
> has that disabled by default.
>
> Any ideas?
>
> Thanks,
> Andrew
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/