linux-ext4 - Re: crash in __jbd2_journal_file

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1308091504320.14125@cobra.newdream.net>
Date:	Fri, 9 Aug 2013 15:11:05 -0700 (PDT)
From:	Sage Weil <sage@...tank.com>
To:	Jan Kara <jack@...e.cz>
cc:	linux-ext4@...r.kernel.org
Subject: Re: crash in __jbd2_journal_file_buffer

On Fri, 9 Aug 2013, Jan Kara wrote:
> On Fri 09-08-13 10:36:37, Sage Weil wrote:
> > Hi Jan,
> > 
> > Sorry for the slow response; took a while for this to happen again.  This 
> > time I'm keeping the machine sitting in kdb in case there is more 
> > information needed.
> > 
> > <4>[19307.314449] ------------[ cut here ]------------
> > <4>[19307.319114] WARNING: at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1237 jbd2_journal_dirty_metadata+0x1ba/0x260()
> > 
> > <4>[19307.382324] CPU: 0 PID: 8877 Comm: ceph-osd Tainted: G        W    3.10.0-ceph-00049-g68d04c9 #1
> > <4>[19307.391256] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > <4>[19307.398879]  ffffffff81a1d3c8 ffff880214469928 ffffffff816311b0 ffff880214469968
> > <4>[19307.407572]  ffffffff8103fae0 ffff880214469958 ffff880170a9dc30 ffff8802240fbe80
> > <4>[19307.415153]  0000000000000000 ffff88020b366000 ffff8802256e7510 ffff880214469978
> > <4>[19307.422633] Call Trace:
> > <4>[19307.425209]  [<ffffffff816311b0>] dump_stack+0x19/0x1b
> > <4>[19307.430368]  [<ffffffff8103fae0>] warn_slowpath_common+0x70/0xa0
> > <4>[19307.436502]  [<ffffffff8103fb2a>] warn_slowpath_null+0x1a/0x20
> > <4>[19307.442356]  [<ffffffff81267c2a>] jbd2_journal_dirty_metadata+0x1ba/0x260
> > <4>[19307.449271]  [<ffffffff81245093>] __ext4_handle_dirty_metadata+0xa3/0x140
> > <4>[19307.456192]  [<ffffffff812561f3>] ext4_xattr_release_block+0x103/0x1f0
> > <4>[19307.462742]  [<ffffffff81256680>] ext4_xattr_block_set+0x1e0/0x910
> > <4>[19307.469049]  [<ffffffff8125795b>] ext4_xattr_set_handle+0x38b/0x4a0
> > <4>[19307.475445]  [<ffffffff810a319d>] ? trace_hardirqs_on+0xd/0x10
> > <4>[19307.481300]  [<ffffffff81257b32>] ext4_xattr_set+0xc2/0x140
> > <4>[19307.486995]  [<ffffffff81258547>] ext4_xattr_user_set+0x47/0x50
> > <4>[19307.492936]  [<ffffffff811935ce>] generic_setxattr+0x6e/0x90
> > <4>[19307.498734]  [<ffffffff81193ecb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > <4>[19307.505049]  [<ffffffff811940d4>] vfs_setxattr+0xc4/0xd0
> > <4>[19307.510380]  [<ffffffff8119421e>] setxattr+0x13e/0x1e0
> > <4>[19307.515646]  [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
> > <4>[19307.521587]  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> > <4>[19307.527812]  [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
> > <4>[19307.533230]  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> > <4>[19307.539522]  [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
> > <4>[19307.545492]  [<ffffffff811946be>] SyS_fsetxattr+0xbe/0x100
> > <4>[19307.551001]  [<ffffffff816407c2>] system_call_fastpath+0x16/0x1b
> > <4>[19307.557137] ---[ end trace 3e447f9462172c58 ]---
>   Hum, weird. So we returned EUCLEAN from jbd2_journal_dirty_metadata()
> meaning that buffer head passed in there didn't have journal head attached.
> However ext4_xattr_release_block() does call
> ext4_journal_get_write_access() for the bh before calling
> ext4_handle_dirty_xattr_block(). So we should have bh attached to the
> running transaction which also keeps journal head in place.
> 
> > <2>[19307.561776] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> > <3>[19307.570181] Aborting journal on device sda1-8.
> > <2>[19307.604589] EXT4-fs (sda1): Remounting filesystem read-only
> > <2>[19307.610273] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> > <0>[19307.623337] journal commit I/O error
> > <0>[19307.623342] journal commit I/O error
> > <0>[19307.623405] journal commit I/O error
> > <0>[19307.623519] journal commit I/O error
> > <2>[19307.623585] EXT4-fs error (device sda1): __ext4_journal_start_sb:62: Detected aborted journal
> > <1>[19307.623642] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> > <1>[19307.623649] IP: [<ffffffff81267c4a>] jbd2_journal_dirty_metadata+0x1da/0x260
>   Can you possibly disassemble where exactly did we crash in this function?
> Apparently we didn't properly handle the fact that the journal got aborted
> here. But I'm not sure what exactly failed.

Hrm, I'm having trouble getting from a bzImage back to something I can 
objdump with inline code, so this is a separate compile with the same 
.config that is hopefully right:

                jh->b_modified = 1;
                J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
    1e33:       0f 0b                   ud2    
         */
        if (jh->b_transaction != transaction) {
                JBUFFER_TRACE(jh, "already on other transaction");
                if (unlikely(jh->b_transaction !=
                             journal->j_committing_transaction)) {
                        printk(KERN_EMERG "JBD: %s: "
    1e35:       4d 85 c9                test   %r9,%r9
    1e38:       74 04                   je     1e3e <jbd2_journal_dirty_metadata+0x1de>
    1e3a:       41 8b 41 08             mov    0x8(%r9),%eax
**here^^^
    1e3e:       45 31 c0                xor    %r8d,%r8d
    1e41:       48 85 c9                test   %rcx,%rcx
    1e44:       74 04                   je     1e4a <jbd2_journal_dirty_metadata+0x1ea>
    1e46:       44 8b 41 08             mov    0x8(%rcx),%r8d
    1e4a:       48 8b 53 18             mov    0x18(%rbx),%rdx
                               "jh->b_transaction (%llu, %p, %u) != "
                               "journal->j_committing_transaction (%p, %u)",
                               journal->j_devname,
    1e4e:       49 8d b6 80 05 00 00    lea    0x580(%r14),%rsi
         */
        if (jh->b_transaction != transaction) {
                JBUFFER_TRACE(jh, "already on other transaction");
                if (unlikely(jh->b_transaction !=
                             journal->j_committing_transaction)) {
                        printk(KERN_EMERG "JBD: %s: "
    1e55:       89 04 24                mov    %eax,(%rsp)
    1e58:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
                        1e5b: R_X86_64_32S      .rodata.str1.8+0x160
    1e5f:       31 c0                   xor    %eax,%eax
    1e61:       e8 00 00 00 00          callq  1e66 <jbd2_journal_dirty_metadata+0x206>
                        1e62: R_X86_64_PC32     printk-0x4
                               jh->b_transaction,
                               jh->b_transaction ? jh->b_transaction->t_tid : 0,
                               journal->j_committing_transaction,
                               journal->j_committing_transaction ?
                               journal->j_committing_transaction->t_tid : 0);
                        ret = -EINVAL;
    1e66:       b8 ea ff ff ff          mov    $0xffffffea,%eax
    1e6b:       e9 a4 fe ff ff          jmpq   1d14 <jbd2_journal_dirty_metadata+0xb4>
                }
                if (unlikely(jh->b_next_transaction != transaction)) {
                        printk(KERN_EMERG "JBD: %s: "
    1e70:       45 31 c0                xor    %r8d,%r8d
    1e73:       48 85 c9                test   %rcx,%rcx
    1e76:       41 8b 44 24 08          mov    0x8(%r12),%eax
    1e7b:       74 04                   je     1e81 <jbd2_journal_dirty_metadata+0x221>
    1e7d:       44 8b 41 08             mov    0x8(%rcx),%r8d
    1e81:       48 8b 53 18             mov    0x18(%rbx),%rdx
                               "jh->b_next_transaction (%llu, %p, %u) != "
                               "transaction (%p, %u)",
                               journal->j_devname,
    1e85:       49 8d b6 80 05 00 00    lea    0x580(%r14),%rsi
                               journal->j_committing_transaction ?
                               journal->j_committing_transaction->t_tid : 0);
                        ret = -EINVAL;
                }
                if (unlikely(jh->b_next_transaction != transaction)) {
                        printk(KERN_EMERG "JBD: %s: "
    1e8c:       89 04 24                mov    %eax,(%rsp)
    1e8f:       4d 89 e1                mov    %r12,%r9
    1e92:       48 c7 c7 00 00 00 00    mov    $0x0,%rdi
                        1e95: R_X86_64_32S      .rodata.str1.8+0x1c0
    1e99:       31 c0                   xor    %eax,%eax
    1e9b:       e8 00 00 00 00          callq  1ea0 <jbd2_journal_dirty_metadata+0x240>
                        1e9c: R_X86_64_PC32     printk-0x4
                               (unsigned long long) bh->b_blocknr,
                               jh->b_next_transaction,
                               jh->b_next_transaction ?
                               jh->b_next_transaction->t_tid : 0,
                               transaction, transaction->t_tid);
                        ret = -EINVAL;
    1ea0:       b8 ea ff ff ff          mov    $0xffffffea,%eax
    1ea5:       e9 77 fe ff ff          jmpq   1d21 <jbd2_journal_dirty_metadata+0xc1>
 *  bit-based spin_unlock()
 */
static inline void bit_spin_unlock(int bitnum, unsigned long *addr)
{
#ifdef CONFIG_DEBUG_SPINLOCK
        BUG_ON(!test_bit(bitnum, addr));
    1eaa:       e8 00 00 00 00          callq  1eaf <jbd2_journal_dirty_metadata+0x24f>
                        1eab: R_X86_64_PC32     .text.unlikely-0x4
                 * transaction's data buffer, ever. */

Registers are

[4]kdb> rd
ax: 0000000000000000  bx: ffff8801f57837b8  cx: 0000000000000000
dx: 000000000091c029  si: 000000000091c029  di: 0000000000000001
bp: ffff880214469a58  sp: ffff880214469a08  r8: ffff8801f57837b8
r9: 0000000000000008  r10: 0000000000000000  r11: 0000000000000000
r12: ffff8802256e7870  r13: ffff8801893c52a0  r14: ffff8802256e76c0
r15: ffff8802256e7510  ip: ffffffff81267c4a  flags: 00010202  cs: 00000010
ss: 00000018  ds: 00000018  es: 00000018  fs: 00000018  gs: 00000018

journal == NULL?

sage


> 
> 								Honza
> 
> > <4>[19307.623653] PGD 20bc32067 PUD 2245fc067 PMD 0 
> > <4>[19307.623657] Oops: 0000 [#1] SMP 
> > [dumpcommon]kdb>   -bt
> > 
> > Stack traceback for pid 8877
> > 0xffff88020db7bf20     8877     8795  1    4   R  0xffff88020db7c3a8 *ceph-osd
> >  ffff880214469a08 0000000000000018 ffffffff81267ad0 ffffea000834d600
> >  ffff880226c03300 ffffffff81256562 0000000000000282 ffff880214469a68
> >  0000000000000000 00000000000011a5 ffffffff81825270 ffff8802256e7510
> > Call Trace:
> >  [<ffffffff81267ad0>] ? jbd2_journal_dirty_metadata+0x60/0x260
> >  [<ffffffff81256562>] ? ext4_xattr_block_set+0xc2/0x910
> >  [<ffffffff81245093>] ? __ext4_handle_dirty_metadata+0xa3/0x140
> >  [<ffffffff8121aeee>] ? ext4_mark_iloc_dirty+0x40e/0x660
> >  [<ffffffff81257835>] ? ext4_xattr_set_handle+0x265/0x4a0
> >  [<ffffffff81257b32>] ? ext4_xattr_set+0xc2/0x140
> >  [<ffffffff81258547>] ? ext4_xattr_user_set+0x47/0x50
> >  [<ffffffff811935ce>] ? generic_setxattr+0x6e/0x90
> >  [<ffffffff81193ecb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
> >  [<ffffffff811940d4>] ? vfs_setxattr+0xc4/0xd0
> >  [<ffffffff8119421e>] ? setxattr+0x13e/0x1e0
> >  [<ffffffff811719c7>] ? __sb_start_write+0xe7/0x1b0
> >  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> >  [<ffffffff8118c65c>] ? fget_light+0x3c/0x130
> >  [<ffffffff8118f2e8>] ? mnt_want_write_file+0x28/0x60
> >  [<ffffffff8118f1f8>] ? __mnt_want_write+0x58/0x70
> >  [<ffffffff811946be>] ? SyS_fsetxattr+0xbe/0x100
> >  [<ffffffff816407c2>] ? system_call_fastpath+0x16/0x1b
> > 
> > 
> > The workload is a ceph-osd daemon, which tends to hammer pretty heavily on 
> > the xattr paths.  I don't have a nice self-contained reproducer or 
> > anything since this is is falling out of our integration tests.  
> > Hopefully there is enough here (or that can be gleaned from kdb) that it 
> > is clear what is going on.  It's v3.10 (plus some unrelated patches).
> > 
> > Thanks!
> > sage
> > 
> > 
> > 
> > On Wed, 31 Jul 2013, Jan Kara wrote:
> > >   Hello,
> > > 
> > > On Tue 30-07-13 15:41:40, Sage Weil wrote:
> > > > Hit this on 3.10.  Is this a known issue?
> > >   No, I haven't seen it. Why did the kernel crash?
> > > 
> > > 								Honza
> > > > 
> > > > Thanks-
> > > > sage
> > > > 
> > > > 
> > > > Stack traceback for pid 23944
> > > > 0xffff8802206edeb0    23944    23840  1    2   R  0xffff8802206ee338 
> > > > *ceph-osd
> > > >  ffff88020bf17b78 0000000000000018 ffffffff81267398 ffff8802256d40c0
> > > >  ffff88020b5a1230 ffff88020bf17bb8 ffffffff81325d9c ffff88020b5a1230
> > > >  ffff8802256d40c0 ffff88020b5a1230 00000000698a8d24 ffff88020bf17be8
> > > > Call Trace:
> > > >  [<ffffffff81267398>] ? __jbd2_journal_file_buffer+0x188/0x260
> > > >  [<ffffffff81325d9c>] ? do_raw_spin_lock+0x10c/0x150
> > > >  [<ffffffff81268118>] ? do_get_write_access+0x448/0x650
> > > >  [<ffffffff81213caf>] ? ext4_read_inode_bitmap+0x9f/0x5f0
> > > >  [<ffffffff81167849>] ? kmem_cache_alloc+0x39/0x160
> > > >  [<ffffffff81268490>] ? jbd2_journal_get_write_access+0x30/0x50
> > > >  [<ffffffff81244d43>] ? __ext4_journal_get_write_access+0x43/0x90
> > > >  [<ffffffff812143d7>] ? ext4_free_inode+0x1d7/0x5d0
> > > >  [<ffffffff8121e601>] ? ext4_evict_inode+0x341/0x4d0
> > > >  [<ffffffff8121b268>] ? ext4_mark_inode_dirty+0x88/0x230
> > > >  [<ffffffff8121e610>] ? ext4_evict_inode+0x350/0x4d0
> > > >  [<ffffffff8118a098>] ? evict+0xb8/0x1c0
> > > >  [<ffffffff8118a9d5>] ? iput+0x105/0x190
> > > >  [<ffffffff8117d341>] ? do_unlinkat+0x201/0x270
> > > >  [<ffffffff8131e9be>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> > > >  [<ffffffff81180356>] ? SyS_unlink+0x16/0x20
> > > > 
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > > > the body of a message to majordomo@...r.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > -- 
> > > Jan Kara <jack@...e.cz>
> > > SUSE Labs, CR
> > > 
> > > 
> -- 
> Jan Kara <jack@...e.cz>
> SUSE Labs, CR
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html