[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1161710801.1242.6.camel@localhost.localdomain>
Date:	Tue, 24 Oct 2006 18:26:41 +0100
From:	Sergio Monteiro Basto <sergio@...giomb.no-ip.org>
To:	Michael Sallaway <michael.sallaway@...il.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: PROBLEM: Oops when doing disk heavy disk I/O
Hi, 
please boot with "report_lost_ticks" like
http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2
and see if you have 
time.c: Lost n timer tick(s)!
and 
cat /sys/devices/system/clocksource/clocksource0/
cat /sys/devices/system/clocksource/clocksource0/current_clocksource
Thanks, 
On Wed, 2006-10-25 at 01:56 +1000, Michael Sallaway wrote:
> [1.] One line summary of the problem:
> Kernel Oopses when doing I/O to a disk (using dd).
> 
> [2.] Full description of the problem/report:
> When writing to [any] disk (IDE or SCSI), the kernel will Oops after
> short periods of time ranging from 30 seconds to 5-10 minutes.
> Sometimes this is a complete crash (with "Aieee, killing interrupt
> handler!"), sometimes it's just an oops but the system doesn't crash
> comepletely (isn't very usable, though), and sometimes it just gives a
> "general protection fault: 0000 [1] SMP".
> 
> It's worth mentioning that I've managed to set up the entire system
> without incident -- a debian netinstall, downloading new packages,
> changing things, etc. The only reason I discovered this was when
> copying large amounts of data off another machine, and it died
> reproducably after a few gigabytes of data. (I originally thought it
> was an XFS issue, but had the same problem with EXT3, and all other
> combinations I tried - I can now reproduce it by using dd if=/dev/zero
> of=/dev/hda6.)
> 
> Ultimately, I've tried it with different (known good) devices and hard
> drives. The only common things between all failures are the CPU
> (Athlon 64 3200), Motherboard (Asus M2N-E), and RAM (2GB DDR2-533).
> (Memtest x86 shows the memory to be fine.) As such, I'm suspecting
> it's something to do with the motherboard -- It's using an x86_64
> kernel (although it does the same with an i386), on an nforce 570
> motherboard.
> 
> Other things I have tried:
> - SATA, SCSI and IDE drives -- all do the same thing
> - removing *all* drives and cards and devices -- it does it with a
> single IDE drive connected and no PCI cards
> - kernels 2.6.16, 18, 18.1, 19-rc3.
> - the patch suggested in
> http://marc.theaimsgroup.com/?l=linux-kernel&m=115545986619977&w=2
> - booting with noapic and/or acpi=off (as suggested http://tinyurl.com/yn97woby)
> - with and without md devices
> 
> 
> [3.] Keywords (i.e., modules, networking, kernel):
> kernel disk I/O nforce
> 
> [4.] Kernel version (from /proc/version):
> Linux version 2.6.19-rc3 (root@...bossa) (gcc version 4.1.2 20061007
> (prerelease) (Debian 4.1.1-16)) #1 SMP Tue Oct 24 22:23:07 EST 2006
> 
> [5.] Output of Oops.. message (if applicable) with symbolic
> information resolved (see Documentation/oops-tracing.txt)
> (as I understand it from Documentation/oops-tracing.txt, ksymoops
> doesn't apply anymore? If that's not the case, I apologise -- could
> someone tell me what I need to do with the below?)
> 
> Oct 25 00:23:11 barbossa kernel: Unable to handle kernel NULL pointer
> dereference at 0000000000000000 RIP:
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8021a6c5>]
> __block_write_full_page+0xa4/0x2df
> Oct 25 00:23:11 barbossa kernel: PGD 2b93a067 PUD 7c7cc067 PMD 0
> Oct 25 00:23:11 barbossa kernel: Oops: 0000 [1] SMP
> Oct 25 00:23:11 barbossa kernel: CPU 0
> Oct 25 00:23:11 barbossa kernel: Modules linked in:
> Oct 25 00:23:11 barbossa kernel: Pid: 2442, comm: dd Not tainted 2.6.19-rc3 #1
> Oct 25 00:23:11 barbossa kernel: RIP: 0010:[<ffffffff8021a6c5>]
> [<ffffffff8021a6c5>] __block_write_full_page+0xa4/0x2df
> Oct 25 00:23:11 barbossa kernel: RSP: 0018:ffff81003fa358f8  EFLAGS: 00010283
> Oct 25 00:23:11 barbossa kernel: RAX: 0000000000000000 RBX:
> 0000000000000000 RCX: 0000000000000002
> Oct 25 00:23:11 barbossa kernel: RDX: 000000000000000a RSI:
> 000000000019eea9 RDI: ffff810037cc8440
> Oct 25 00:23:11 barbossa kernel: RBP: ffff810001602550 R08:
> ffff810037cc8440 R09: ffff81003fa35b48
> Oct 25 00:23:11 barbossa kernel: R10: ffffffff802bee4c R11:
> ffffffff80440b8b R12: ffff81001b3426e0
> Oct 25 00:23:11 barbossa kernel: R13: 000000000067baa6 R14:
> ffff810037cc8440 R15: 0000000001c42574
> Oct 25 00:23:11 barbossa kernel: FS:  00002b64b70eb6d0(0000)
> GS:ffffffff807d6000(0000) knlGS:0000000000000000
> Oct 25 00:23:11 barbossa kernel: CS:  0010 DS: 0000 ES: 0000 CR0:
> 000000008005003b
> Oct 25 00:23:11 barbossa kernel: CR2: 0000000000000000 CR3:
> 000000007b6b7000 CR4: 00000000000006e0
> Oct 25 00:23:11 barbossa kernel: Process dd (pid: 2442, threadinfo
> ffff81003fa34000, task ffff810037e44140)
> Oct 25 00:23:11 barbossa kernel: Stack:  ffff81003fa35b48
> ffffffff802bee4c 0000040001602550 ffff810001602550
> Oct 25 00:23:11 barbossa kernel:  ffff81003fa35b48 ffff810037cc8550
> 000000000000000b ffff81007efe9b08
> Oct 25 00:23:11 barbossa kernel:  0000000000000000 ffffffff8029db1e
> 000000000000000e ffffffff802bdff3
> Oct 25 00:23:11 barbossa kernel: Call Trace:
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff802bee4c>] blkdev_get_block+0x0/0x46
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8029db1e>]
> generic_writepages+0x18e/0x2d8
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff802bdff3>] blkdev_writepage+0x0/0xf
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8025591f>] do_writepages+0x20/0x2d
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8022cf8c>]
> __writeback_single_inode+0x1b4/0x38b
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8021f44f>]
> sync_sb_inodes+0x1d1/0x2b5
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8024b3b7>]
> writeback_inodes+0x82/0xd8
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8029de51>]
> balance_dirty_pages_ratelimited_nr+0x115/0x1f6
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8020f4b1>]
> generic_file_buffered_write+0x516/0x64b
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff802295b6>] remove_suid+0x1/0x1c
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff80215018>]
> __generic_file_aio_write_nolock+0x375/0x3e8
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff802078bb>] unmap_vmas+0x372/0x716
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8029bdea>]
> generic_file_aio_write_nolock+0x3a/0x86
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff802166a9>] do_sync_write+0xc9/0x10c
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff802899a3>]
> autoremove_wake_function+0x0/0x2e
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8022c1e9>] __clear_user+0x12/0x50
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8048c584>] read_zero+0x1d1/0x23c
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff802153ee>] vfs_write+0xce/0x174
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8020afee>] fget_light+0x18/0x7c
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff80215d2f>] sys_write+0x45/0x6e
> Oct 25 00:23:11 barbossa kernel:  [<ffffffff8025811e>] system_call+0x7e/0x83
> Oct 25 00:23:11 barbossa kernel:
> Oct 25 00:23:11 barbossa kernel:
> Oct 25 00:23:11 barbossa kernel: Code: 8b 03 a8 20 75 6c 8b 03 a8 02
> 74 66 8b 44 24 14 48 39 43 20
> Oct 25 00:23:11 barbossa kernel: RIP  [<ffffffff8021a6c5>]
> __block_write_full_page+0xa4/0x2df
> Oct 25 00:23:11 barbossa kernel:  RSP <ffff81003fa358f8>
> Oct 25 00:23:11 barbossa kernel: CR2: 0000000000000000
> 
> 
> [6.] A small shell script or example program which triggers the
> problem (if possible)
> dd if=/dev/zero of=/dev/hda6 bs=512K
> 
> (note that it will also happen without the bs argument, however
> usually takes longer. It's not related to a particular point in the
> disk, or anything, though, just seems to last longer.)
> 
> 
> [7.] Environment
> 
> Please see the below output for more environment information -- I
> didn't want to dump too much info in here. :-)
> 
> http://sallaway.org/lkml/output.txt
> 
> also, I've seen mention of this, but I'm not sure if it would be useful:
> 
> http://sallaway.org/lkml/System.map-2.6.19-rc3
> 
> 
> Please let me know if you need any more info. This is my first bug
> report, so apologies if this should have gone elsewhere. :-)
> 
> Cheers,
> Michael
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
