[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20071017183716.GU15552@kernel.dk>
Date: Wed, 17 Oct 2007 20:37:16 +0200
From: Jens Axboe <jens.axboe@...cle.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel@...r.kernel.org, Jeff Garzik <jgarzik@...ox.com>,
Alan Cox <alan@...rguk.ukuu.org.uk>
Subject: Re: [bug] ata subsystem related crash with latest -git
On Wed, Oct 17 2007, Jens Axboe wrote:
> On Wed, Oct 17 2007, Ingo Molnar wrote:
> >
> > ok, here's a different but similar crash that triggers on the testbox:
> >
> > [ 233.438890] BUG: unable to handle kernel paging request at virtual address 7d93e000
> > [ 233.446390] printing eip: 784e9480 *pde = 01000067 *pte = 0593e000
> > [ 233.452630] Oops: 0000 [#1] DEBUG_PAGEALLOC
> > [ 233.456790]
> > [ 233.458264] Pid: 0, comm: swapper Not tainted (2.6.23 #5)
> > [ 233.463637] EIP: 0060:[<784e9480>] EFLAGS: 00010087 CPU: 0
> > [ 233.469101] EIP is at ata_qc_issue+0x90/0x380
> > [ 233.473429] EAX: 7d93dff0 EBX: 0000001f ECX: 7d93dff0 EDX: 798daf80
> > [ 233.479668] ESI: 00000020 EDI: 7d93de00 EBP: 7b54007c ESP: 78a13e14
> > [ 233.485908] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > [ 233.491282] Process swapper (pid: 0, ti=78a12000 task=789753e0 task.ti=78a12000)
> > [ 233.498473] Stack: 7d93de00 7b540000 7b540000 00000000 7d93dfe0 7b54007c 7d93db00 7b5417a4
> > [ 233.506793] 784c2490 784ef69e 784f21f3 7b52de98 7d93db00 7b540000 7b5417a4 7d93db00
> > [ 233.515112] 7b540000 7b524004 784f22e0 784ef380 784c2490 7d93db00 00000202 7b524004
> > [ 233.523432] Call Trace:
> > [ 233.526033] [<784c2490>] scsi_done+0x0/0x20
> > [ 233.530279] [<784ef69e>] ata_scsi_translate+0xbe/0x140
> > [ 233.535478] [<784f21f3>] ata_scsi_queuecmd+0x33/0x200
> > [ 233.540591] [<784f22e0>] ata_scsi_queuecmd+0x120/0x200
> > [ 233.545791] [<784ef380>] ata_scsi_rw_xlat+0x0/0x220
> > [ 233.550730] [<784c2490>] scsi_done+0x0/0x20
> > [ 233.554976] [<784c2d12>] scsi_dispatch_cmd+0x152/0x290
> > [ 233.560177] [<78135c67>] trace_hardirqs_on+0x67/0xb0
> > [ 233.565202] [<784c8c7e>] scsi_request_fn+0x1be/0x370
> > [ 233.570229] [<78408086>] blk_run_queue+0x36/0x80
> > [ 233.574909] [<784c7520>] scsi_next_command+0x30/0x50
> > [ 233.579935] [<784c76ab>] scsi_end_request+0xab/0xe0
> > [ 233.584875] [<784c83f9>] scsi_io_completion+0xa9/0x3d0
> > [ 233.590075] [<78135c67>] trace_hardirqs_on+0x67/0xb0
> > [ 233.595100] [<78405125>] blk_done_softirq+0x45/0x80
> > [ 233.600040] [<78405153>] blk_done_softirq+0x73/0x80
> > [ 233.604981] [<7811d4c3>] __do_softirq+0x53/0xb0
> > [ 233.609573] [<7811d588>] do_softirq+0x68/0x70
> > [ 233.613993] [<78105351>] do_IRQ+0x51/0x90
> > [ 233.618066] [<78135c9c>] trace_hardirqs_on+0x9c/0xb0
> > [ 233.623092] [<7810f2d0>] pgd_dtor+0x0/0x50
> > [ 233.627252] [<7810388e>] common_interrupt+0x2e/0x40
> > [ 233.632192] [<7810f2d0>] pgd_dtor+0x0/0x50
> > [ 233.636352] [<7815f3be>] quicklist_trim+0x5e/0x90
> > [ 233.641118] [<7810f2cb>] check_pgt_cache+0x1b/0x20
> > [ 233.645971] [<78100c52>] cpu_idle+0x32/0x60
> > [ 233.650217] [<78a14b35>] start_kernel+0x265/0x300
> > [ 233.654983] [<78a14380>] unknown_bootoption+0x0/0x1e0
> > [ 233.660097] =======================
> > [ 233.663649] Code: 00 00 00 8b 45 34 a8 02 0f 84 ed 00 00 00 8b bd 88 00 00 00 31 db 89 3c 24 8b 75 3c 89 f8 c7 44 24 10 00 00 00 00 eb 1b 8d 76 00 <8b> 50 10 8d 48 10 f6 c2 01 0f 85 be 02 00 00 89 44 24 10 83 c3
> > [ 233.682455] EIP: [<784e9480>] ata_qc_issue+0x90/0x380 SS:ESP 0068:78a13e14
> > [ 233.689302] Kernel panic - not syncing: Fatal exception in interrupt
> >
> > (gdb) list *0x784e9480
> > 0x784e9480 is in ata_qc_issue (include/linux/scatterlist.h:48).
> > 43 */
> > 44 static inline struct scatterlist *sg_next(struct scatterlist *sg)
> > 45 {
> > 46 sg++;
> > 47
> > 48 if (unlikely(sg_is_chain(sg)))
> > 49 sg = sg_chain_ptr(sg);
> > 50
> > 51 return sg;
> > 52 }
> > (gdb)
> >
> > so there's sg_next() involvement too. Below is the disassembly.
>
> You must have a magic test box :-)
>
> Will investigate... libata doesn't actually enable chaining, but since
> i386 supports it, it ends up using the chain helpers anyway.
>
> There seems to be some automatic inlining involved here, it must be
> dying inside ata_sg_setup().
OK, something to try out - libata doesn't enable sg chaining, since the
support isn't complete yet. Here's a dumb check just to verify that
scsi_alloc_sgtable() will NEVER return a chain entry for a host that
doesn't have it enabled. If this triggers for you, then that could
explain your problem.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index aac8a02..cc89c86 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -777,8 +777,12 @@ struct scatterlist *scsi_alloc_sgtable(struct scsi_cmnd *cmd, gfp_t gfp_mask)
* sglist must be the biggest one, or we would not have
* ended up doing another loop.
*/
- if (prev)
+ if (prev) {
+ struct Scsi_Host *shost = cmd->device->host;
+
+ BUG_ON(!shost->use_sg_chaining);
sg_chain(prev, SCSI_MAX_SG_SEGMENTS, sgl);
+ }
/*
* don't allow subsequent mempool allocs to sleep, it would
--
Jens Axboe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists