[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4EE14E6B.9080502@jp.fujitsu.com>
Date: Fri, 09 Dec 2011 08:55:23 +0900
From: Toshiyuki Okajima <toshi.okajima@...fujitsu.com>
To: Yongqiang Yang <xiaoqiangnk@...il.com>
CC: tytso@....edu, adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org
Subject: Re: [PATCH][RFC] ext4: fix a panic by ext4_change_inode_journal_flag()
with delalloc mode
Hi, Yongqiang
(2011/12/08 19:40), Yongqiang Yang wrote:
> Hi,
>
> The patch [ext4: allocate delalloc blocks before changing journal
> mode] should fix up the problem. Changing journal mode in fly has
> some other problems, the patches fixing the problems had been post
> out. Ted has not picked it up yet.
Sorry, I missed your patch set.
I'll check your patch set.
>
> sync whole file system is not a good idea, we just need to allocate
> delalloc blocks of the inode.
Yes. I think it is not the best solution to sync whole filesystem.
But I counld not figure out easy solution which was not to sync
whole filesystem.
Thanks.
>
> Yongqiang.
>
> On Thu, Dec 8, 2011 at 6:24 PM, Toshiyuki Okajima
> <toshi.okajima@...fujitsu.com> wrote:
>> The following reproducer causes a panic.
>>
>> Reproducer:
>> -------------------------------------------------------------------------------
>> #!/bin/sh
>>
>> echo "a" > /tmp/a # file under an ext4 filesystem
>> chattr +j /tmp/a
>> echo "a" >> /tmp/a
>> chattr -j /tmp/a
>> -------------------------------------------------------------------------------
>>
>> The timing when the panic occurs is a time of executing "chattr -j".
>> Because "chattr -j" calls ioctl(EXT4_IOC_SETFLAGS) which executes
>> ext4_change_inode_journal_flag() to change the attribute of "/tmp/a".
>>
>> The detailed explanation is as follows:
>>
>> ext4_change_inode_journal_flag() needs to write all the unprocessed data of
>> this inode (/tmp/a) by jbd2_journal_flush() first. However, if an ext4
>> filesystem mounts with "delalloc" mode, jbd2_journal_flush() cannot completely
>> flush all the unprocessed delayed allocation's buffers of this inode before
>> calling jbd2_log_do_checkpoint().
>>
>> Therefore, a panic can occur by "BUG_ON(buffer_delay(bh))" of submit_bh() via
>> __flush_batch() called from jbd2_log_do_checkpoint() because some unprocessed
>> delayed allocation's buffers may remain though jbd2_journal_flush() flushed all
>> transactions.
>>
>> The following is an example backtrace():
>> -------------------------------------------------------------------------------
>> crash> bt
>> PID: 4591 TASK: ffff88007980a0c0 CPU: 2 COMMAND: "chattr"
>> #0 [ffff88003795f900] machine_kexec at ffffffff810356e5
>> #1 [ffff88003795f980] crash_kexec at ffffffff810a5c64
>> #2 [ffff88003795fa50] oops_end at ffffffff8149bf7b
>> #3 [ffff88003795fa80] die at ffffffff810163cb
>> #4 [ffff88003795fab0] do_trap at ffffffff8149b960
>> #5 [ffff88003795fb00] do_invalid_op at ffffffff810143c5
>> #6 [ffff88003795fba0] invalid_op at ffffffff814a4b2b
>> [exception RIP: submit_bh+288]
>> RIP: ffffffff8117e280 RSP: ffff88003795fc58 RFLAGS: 00010202
>> RAX: 000000000004c225 RBX: ffff880071890f00 RCX: 0000000000000002
>> RDX: ffff88003795ffd8 RSI: ffff880071890f00 RDI: 0000000000000211
>> RBP: ffff88003795fc78 R8: 0000000091827364 R9: 0000000000000000
>> R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000211
>> R13: 0000000000000211 R14: ffff88003795fca8 R15: 0000000000000004
>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>> #7 [ffff88003795fc50] __sync_dirty_buffer at ffffffff811817dd
>> #8 [ffff88003795fc80] write_dirty_buffer at ffffffff8117ede7
>> #9 [ffff88003795fca0] __flush_batch at ffffffffa01af8f8 [jbd2]
>> #10 [ffff88003795fd00] jbd2_log_do_checkpoint at ffffffffa01afb72 [jbd2]
>> #11 [ffff88003795fd60] jbd2_journal_flush at ffffffffa01b3e2f [jbd2]
>> #12 [ffff88003795fda0] ext4_change_inode_journal_flag at ffffffffa01d0ad3 [ext4]
>> #13 [ffff88003795fdd0] ext4_ioctl at ffffffffa01d66b6 [ext4]
>> #14 [ffff88003795fea0] vfs_ioctl at ffffffff81162b1d
>> #15 [ffff88003795feb0] do_vfs_ioctl at ffffffff81163145
>> #16 [ffff88003795ff30] sys_ioctl at ffffffff81163644
>> #17 [ffff88003795ff80] system_call_fastpath at ffffffff814a2b42
>> RIP: 00000034930d95f7 RSP: 00007fff656ceae8 RFLAGS: 00010213
>> RAX: 0000000000000010 RBX: ffffffff814a2b42 RCX: 00000034930d95f7
>> RDX: 00007fff656ceb8c RSI: 0000000040086602 RDI: 0000000000000003
>> RBP: 0000000000080000 R8: 0000000000000000 R9: 0000000000000000
>> R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000004000
>> R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000003
>> ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
>> -------------------------------------------------------------------------------
>>
>> Therefore, we must flush all unprocessed delayed allocation's buffers and turn
>> "delalloc" mode off before jbd2_journal_flush() calls. Then, we must turn
>> "delalloc" mode on after jbd2_journal_flush() finishes.
>>
>> Signed-off-by: Toshiyuki Okajima <toshi.okajima@...fujitsu.com>
>> ---
>> fs/ext4/ext4.h | 3 +++
>> fs/ext4/inode.c | 18 ++++++++++++++++++
>> 2 files changed, 21 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
>> index 5b0e26a..3eeb68b 100644
>> --- a/fs/ext4/ext4.h
>> +++ b/fs/ext4/ext4.h
>> @@ -1249,6 +1249,9 @@ struct ext4_sb_info {
>>
>> /* record the last minlen when FITRIM is called. */
>> atomic_t s_last_trim_minblks;
>> +
>> + /* to sync the delayed allocation's buffers */
>> + atomic_t s_need_da_sync;
>> };
>>
>> static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb)
>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
>> index 848f436..8d317a3 100644
>> --- a/fs/ext4/inode.c
>> +++ b/fs/ext4/inode.c
>> @@ -2368,6 +2368,14 @@ static int ext4_nonda_switch(struct super_block *sb)
>> */
>> return 1;
>> }
>> +
>> + /*
>> + * Second case to switch to non dealloc mode, if we want to write all the ext4
>> + * data out.
>> + */
>> + if (atomic_read(&sbi->s_need_da_sync) > 0)
>> + return 1;
>> +
>> /*
>> * Even if we don't switch but are nearing capacity,
>> * start pushing delalloc when 1/2 of free blocks are dirty.
>> @@ -4665,6 +4673,8 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
>> journal_t *journal;
>> handle_t *handle;
>> int err;
>> + struct super_block *sb = inode->i_sb;
>> + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>>
>> /*
>> * We have to be very careful here: changing a data block's
>> @@ -4682,6 +4692,13 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
>> if (is_journal_aborted(journal))
>> return -EROFS;
>>
>> + /*
>> + * To write all delayed allocation's buffers out before calling
>> + * jbd2_journal_flush(). Because it cannot flush the buffers.
>> + */
>> + atomic_inc(&sbi->s_need_da_sync);
>> + sync_filesystem(sb);
>> +
>> jbd2_journal_lock_updates(journal);
>> jbd2_journal_flush(journal);
>>
>> @@ -4700,6 +4717,7 @@ int ext4_change_inode_journal_flag(struct inode *inode, int val)
>> ext4_set_aops(inode);
>>
>> jbd2_journal_unlock_updates(journal);
>> + atomic_dec(&sbi->s_need_da_sync);
>>
>> /* Finally we can mark the inode as dirty. */
>>
>> --
>> 1.5.5.6
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists