[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45E5CE9E.7050904@redhat.com>
Date: Wed, 28 Feb 2007 13:49:02 -0500
From: Chuck Ebbert <cebbert@...hat.com>
To: Dan Williams <dan.j.williams@...el.com>
CC: linux-kernel@...r.kernel.org, Jens Axboe <axboe@...cle.com>,
Neil Brown <neilb@...e.de>
Subject: Re: PROBLEM: null pointer dereference in cfq_dispatch_requests (2.6.21-rc2
and 2.6.20)
Dan Williams wrote:
> I can reliably reproduce a null pointer dereference on 2.6.20 and
> 2.6.21-rc2. I will keep digging to find the kernel version where this
> last worked, but wanted to see if there were any immediate experiments I
> should try.
>
> The failure is caused by running tiobench on a MD raid6 array with 6 out
> of 8 disks available. The commands I issued to reproduce this are:
>
> mdadm -A /dev/md0 /dev/sd[bcdefg]
> mount /dev/md0 /mnt/raid
> tiobench --numruns 5 --size 2048 --dir /mnt/raid
>
> The filesystem is ext3. The controller is an LSI 1068. Here are the
> two BUG messages first 2.6.21-rc2 followed by 2.6.20. I will reply to
> this message with the config.
> Kernel 2.6.20 on an i686
>
> [ 177.299787] BUG: unable to handle kernel NULL pointer dereference at virtual address 0000005c
> [ 177.308526] printing eip:
> [ 177.311287] c01de510
> [ 177.313521] *pde = 34d40001
> [ 177.316353] Oops: 0000 [#1]
> [ 177.319202] SMP
> [ 177.321107] Modules linked in: raid456 xor nfsd exportfs lockd nfs_acl sunrpc autofs4 hidp l2cap bluetooth iptable_raw xt_policy xt_multiport ipt_ULOG ipt_TTL ipt_ttl ipt_TOS ipt_tos ipt_SAME ipt_REJECT ipt_REDIRECT ipt_recent ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype xt_tcpmss xt_pkttype xt_physdev xt_NFQUEUE xt_MARK xt_mark xt_mac xt_limit xt_length xt_helper xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables video sbs i2c_ec dock button battery asus_acpi ac radeon drm ipv6 lp parport_pc parport e1000 uhci_hcd floppy mptsas mptscsih mptbase sg ehci_hcd scsi_transport_sas i2c_i801 i2c_core pcspkr dm_snapshot dm_zero dm_mirror dm_mod ata_piix ata_generic libata sd_mod scsi_mod ext3 jbd
> [ 177.402252] CPU: 2
> [ 177.402253] EIP: 0060:[<c01de510>] Not tainted VLI
> [ 177.402253] EFLAGS: 00210016 (2.6.20 #5)
> [ 177.414194] EIP is at cfq_dispatch_insert+0xb/0x53
> [ 177.419056] eax: f7773ec0 ebx: 00000000 ecx: f7773cc0 edx: 00000000
> [ 177.425982] esi: f70abae0 edi: f7773cc0 ebp: 00000000 esp: f34dbcbc
> [ 177.432953] ds: 007b es: 007b ss: 0068
> [ 177.437127] Process tiotest (pid: 5405, ti=f34db000 task=f7efc030 task.ti=f34db000)
> [ 177.444763] Stack: 00000049 f77d3b9c f7773cc0 00000000 c01de6ce c014041e f8a26806 00000082
> [ 177.453456] f7efc030 fffe22d6 00000000 00000000 00000000 00000004 f7efc030 f7773cc0
> [ 177.462121] 00000000 00000000 00000000 f70abae0 f7cd5800 f70abae0 c01d4fcc 00000001
> [ 177.470798] Call Trace:
> [ 177.473503] [<c01de6ce>] cfq_dispatch_requests+0x12d/0x466
> [ 177.479223] [<c014041e>] __lock_acquire+0x9e9/0xa72
> [ 177.484285] [<f8a26806>] scsi_request_fn+0x286/0x336 [scsi_mod]
> [ 177.490485] [<c01d4fcc>] elv_next_request+0x1a2/0x1b2
> [ 177.495766] [<f8a26806>] scsi_request_fn+0x286/0x336 [scsi_mod]
> [ 177.501912] [<c0315ba8>] _spin_lock_irq+0x38/0x43
> [ 177.506840] [<f8a265d9>] scsi_request_fn+0x59/0x336 [scsi_mod]
> [ 177.512981] [<c01d7e7d>] blk_remove_plug+0x5a/0x66
> [ 177.517983] [<c01d7ea6>] __generic_unplug_device+0x1d/0x1f
> [ 177.523705] [<c01d8278>] generic_unplug_device+0x15/0x21
> [ 177.529272] [<f97ee054>] unplug_slaves+0x54/0x88 [raid456]
> [ 177.535013] [<c01d997a>] blk_backing_dev_unplug+0x73/0x7b
> [ 177.540657] [<c0315d82>] _spin_unlock_irqrestore+0x3e/0x4d
> [ 177.546382] [<c0154b26>] sync_page+0x0/0x3b
> [ 177.550774] [<c013f5f4>] trace_hardirqs_on+0x12e/0x158
> [ 177.556108] [<c0154b26>] sync_page+0x0/0x3b
> [ 177.560471] [<c018caa5>] block_sync_page+0x31/0x32
> [ 177.565449] [<c0154b59>] sync_page+0x33/0x3b
> [ 177.569916] [<c0313d9e>] __wait_on_bit_lock+0x2a/0x52
> [ 177.575201] [<c0154b18>] __lock_page+0x58/0x5e
> [ 177.579810] [<c0139612>] wake_bit_function+0x0/0x3c
> [ 177.584905] [<c0155228>] do_generic_mapping_read+0x1db/0x44f
> [ 177.590911] [<c01570cb>] generic_file_aio_read+0x173/0x1a4
> [ 177.596617] [<c0154930>] file_read_actor+0x0/0xdb
> [ 177.601525] [<c0171b47>] do_sync_read+0xc7/0x10a
> [ 177.606365] [<c01395dd>] autoremove_wake_function+0x0/0x35
> [ 177.612130] [<c0171a80>] do_sync_read+0x0/0x10a
> [ 177.616867] [<c01723ce>] vfs_read+0xa6/0x152
> [ 177.621362] [<c0172830>] sys_read+0x41/0x67
> [ 177.625794] [<c0103e24>] syscall_call+0x7/0xb
> [ 177.630403] =======================
> [ 177.634031] Code: da 11 3b c0 c7 04 24 51 9d 39 c0 e8 c9 a1 f4 ff e8 ca 6e f2 ff ff 4f 34 83 c4 18 5b 5e 5f 5d c3 55 57 56 89 c6 53 8b 40 0c 89 d3 <8b> 7a 5c 8b 68 04 89 d0 e8 b5 fe ff ff 8b 43 14 89 da 25 01 80
> [ 177.654378] EIP: [<c01de510>] cfq_dispatch_insert+0xb/0x53 SS:ESP 0068:f34dbcbc
cfq_dispatch_requests() has called cfq_dispatch_insert() with a NULL
second argument (struct request *rq)
There are two patches for raid5/6 out there that might fix this. I'll
attach them (the second just fixes a minor bug in the first one.)
View attachment "linux-2.6-md_md5_6_bio_too_big_fix.patch" of type "text/plain" (3987 bytes)
View attachment "linux-2.6-md_md5_6_bio_too_big_fix_fix.patch" of type "text/plain" (972 bytes)
Powered by blists - more mailing lists