linux-kernel - Re: ext3/jbd oops in journal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <87bpjorn6g.fsf@openvz.org>
Date:	Sat, 31 Oct 2009 11:18:47 +0300
From:	Dmitry Monakhov <dmonakhov@...nvz.org>
To:	Sage Weil <sage@...dream.net>
Cc:	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org
Subject: Re: ext3/jbd oops in journal_start

Sage Weil <sage@...dream.net> writes:

> Hi,
>
> I'm consistently seeing ext3 oops on a fresh ~60 GB fs on 2.6.32-rc3 (and 
> 2.6.31).  data=writeback or data=ordered.  It's not the hardware or 
> drive... I have 8 boxes (each with slightly different hardware) that crash 
> identically.
Strange, 2.6.31 with ext3 is quite popular configuration...
Can you please post exact test-case.
>
> The oops is at fs/jbd/transaction.c, journal_start():
>
> 		J_ASSERT(handle->h_transaction->t_journal == journal);
*handle = journal_current_handle()

IMHO it's looks like you have entered here with current->journal_info != NULL

, but journal_info contains unexpected data
This may happens in two cases:
1) calling jbd code from other filesystem.
2) Some fs forget to zero current->journal_info on exit from vfs
According to call trace we have got second case. Do you use some 
unusual/experimental fs?
>
> because handle->h_transaction is 0x1bf (or some other value close to 
> that).  I can trigger on the 10th or so call to journal_start after 
> mounting.
>
> Has anyone seen this before?  I feel like I must be doing something silly 
> here, since I can't find any references to this particular crash, but I'm 
> having no problem triggering it right away, even after a fresh mke2fs 
> -j...
>
> Any suggestions on where to look or should I just start testing older 
> kernel versions and bisect?
>
> sage
>
>
> [   83.550657] handle->h_transaction 00000000000001bf
> [   83.555564] BUG: unable to handle kernel NULL pointer dereference at 00000000000001bf
> [   83.559531] IP: [<ffffffff8118793c>] journal_start+0x87/0x184
> [   83.559531] PGD 10e351067 PUD 10e1cb067 PMD 0 
> [   83.559531] Oops: 0000 [#1] PREEMPT SMP 
> [   83.559531] last sysfs file: /sys/class/net/lo/operstate
> [   83.559531] CPU 1 
> [   83.559531] Modules linked in: btrfs zlib_deflate fan ac battery 
> ide_pci_generic shpchp k8temp serio_raw psmouse pcspkr ehci_hcd 
> serverworks processor ohci_hcd pci_hotplug thermal button
> [   83.559531] Pid: 2849, comm: cosd Not tainted 2.6.32-rc5 #7 H8SSL-I2
> [   83.559531] RIP: 0010:[<ffffffff8118793c>]  [<ffffffff8118793c>] journal_start+0x87/0x184
> [   83.559531] RSP: 0018:ffff88010e335b28  EFLAGS: 00010292
> [   83.559531] RAX: 00000000000001bf RBX: ffff88010eeee4e0 RCX: 000000000000ad01
> [   83.559531] RDX: ffff88002f400000 RSI: 0000000000000001 RDI: ffffffff81610214
> [   83.559531] RBP: ffff88010e335b58 R08: ffff88010e3359d7 R09: 0000000000000000
> [   83.559531] R10: ffffffff8106314b R11: ffff88010e335908 R12: ffff88010eeee4e0
> [   83.559531] R13: ffff88010e17a200 R14: ffff88010f535800 R15: 000000000000000b
> [   83.559531] FS:  00007fe3bce8b6f0(0000) GS:ffff88002f400000(0000) knlGS:0000000000000000
> [   83.559531] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [   83.559531] CR2: 00000000000001bf CR3: 0000000110223000 CR4: 00000000000006e0
> [   83.559531] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   83.559531] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [   83.559531] Process cosd (pid: 2849, threadinfo ffff88010e334000, task ffff88010e17a200)
> [   83.559531] Stack:
> [   83.559531]  ffff88010e335b58 ffffffff814cbb10 ffffea0006cf6038 ffff88010eeea888
> [   83.559531] <0> 0000000000000000 00000000000005f4 ffff88010e335b68 ffffffff811443b3
> [   83.559531] <0> ffff88010e335c08 ffffffff8113c347 ffff88010e335ca8 ffffffff81070369
> [   83.559531] Call Trace:
> [   83.559531]  [<ffffffff811443b3>] ext3_journal_start_sb+0x4a/0x4c
> [   83.559531]  [<ffffffff8113c347>] ext3_write_begin+0x9c/0x1e2
> [   83.559531]  [<ffffffff81070369>] ? __lock_acquire+0x17d8/0x17ea
> [   83.559531]  [<ffffffff810a5021>] generic_file_buffered_write+0x120/0x2a5
> [   83.559531]  [<ffffffff810a564d>] __generic_file_aio_write+0x34f/0x383
> [   83.559531]  [<ffffffff810a56e4>] generic_file_aio_write+0x63/0xaa
> [   83.559531]  [<ffffffff810d98b2>] do_sync_write+0xe7/0x12d
> [   83.559531]  [<ffffffff8105f368>] ? autoremove_wake_function+0x0/0x38
> [   83.559531]  [<ffffffff8106a7fc>] ? put_lock_stats+0xe/0x27
> [   83.559531]  [<ffffffff8125752c>] ? security_file_permission+0x11/0x13
> [   83.559531]  [<ffffffff810da240>] vfs_write+0xae/0x14a
> [   83.559531]  [<ffffffff810da3a0>] sys_write+0x47/0x6e
> [   83.559531]  [<ffffffff8100baab>] system_call_fastpath+0x16/0x1b
> [   83.559531] Code: 89 de 48 c7 c7 e9 01 61 81 31 c0 e8 71 f6 31 00 48 8b 
> 33 48 c7 c7 f7 01 61 81 31 c0 e8 60 f6 31 00 48 8b 03 48 c7 c7 14 02 61 81 
> <48> 8b 30 31 c0 e8 4c f6 31 00 48 8b 03 48 8b 30 4c 39 f6 74 11 
> [   83.559531] RIP  [<ffffffff8118793c>] journal_start+0x87/0x184
> [   83.559531]  RSP <ffff88010e335b28>
> [   83.559531] CR2: 00000000000001bf
> [   83.847504] ---[ end trace 450f151cbabc2177 ]---
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/