linux-ext4 - Re: [PATCH -next] ext4: Fix BUG_ON in ext4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211222121206.g45g7v4y3i35kvnp@work>
Date:   Wed, 22 Dec 2021 13:12:06 +0100
From:   Lukas Czerner <lczerner@...hat.com>
To:     Ye Bin <yebin10@...wei.com>
Cc:     tytso@....edu, adilger.kernel@...ger.ca,
        linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
        jack@...e.cz
Subject: Re: [PATCH -next] ext4: Fix BUG_ON in ext4_bread when write quota
 data

On Wed, Dec 22, 2021 at 09:35:37AM +0800, Ye Bin wrote:
> We got issue as follows when run syzkaller:
> [  167.936972] EXT4-fs error (device loop0): __ext4_remount:6314: comm rep: Abort forced by user
> [  167.938306] EXT4-fs (loop0): Remounting filesystem read-only
> [  167.981637] Assertion failure in ext4_getblk() at fs/ext4/inode.c:847: '(EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) || handle != NULL || create == 0'
> [  167.983601] ------------[ cut here ]------------
> [  167.984245] kernel BUG at fs/ext4/inode.c:847!
> [  167.984882] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
> [  167.985624] CPU: 7 PID: 2290 Comm: rep Tainted: G    B             5.16.0-rc5-next-20211217+ #123
> [  167.986823] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
> [  167.988590] RIP: 0010:ext4_getblk+0x17e/0x504
> [  167.989189] Code: c6 01 74 28 49 c7 c0 a0 a3 5c 9b b9 4f 03 00 00 48 c7 c2 80 9c 5c 9b 48 c7 c6 40 b6 5c 9b 48 c7 c7 20 a4 5c 9b e8 77 e3 fd ff <0f> 0b 8b 04 244
> [  167.991679] RSP: 0018:ffff8881736f7398 EFLAGS: 00010282
> [  167.992385] RAX: 0000000000000094 RBX: 1ffff1102e6dee75 RCX: 0000000000000000
> [  167.993337] RDX: 0000000000000001 RSI: ffffffff9b6e29e0 RDI: ffffed102e6dee66
> [  167.994292] RBP: ffff88816a076210 R08: 0000000000000094 R09: ffffed107363fa09
> [  167.995252] R10: ffff88839b1fd047 R11: ffffed107363fa08 R12: ffff88816a0761e8
> [  167.996205] R13: 0000000000000000 R14: 0000000000000021 R15: 0000000000000001
> [  167.997158] FS:  00007f6a1428c740(0000) GS:ffff88839b000000(0000) knlGS:0000000000000000
> [  167.998238] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  167.999025] CR2: 00007f6a140716c8 CR3: 0000000133216000 CR4: 00000000000006e0
> [  167.999987] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  168.000944] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  168.001899] Call Trace:
> [  168.002235]  <TASK>
> [  168.007167]  ext4_bread+0xd/0x53
> [  168.007612]  ext4_quota_write+0x20c/0x5c0
> [  168.010457]  write_blk+0x100/0x220
> [  168.010944]  remove_free_dqentry+0x1c6/0x440
> [  168.011525]  free_dqentry.isra.0+0x565/0x830
> [  168.012133]  remove_tree+0x318/0x6d0
> [  168.014744]  remove_tree+0x1eb/0x6d0
> [  168.017346]  remove_tree+0x1eb/0x6d0
> [  168.019969]  remove_tree+0x1eb/0x6d0
> [  168.022128]  qtree_release_dquot+0x291/0x340
> [  168.023297]  v2_release_dquot+0xce/0x120
> [  168.023847]  dquot_release+0x197/0x3e0
> [  168.024358]  ext4_release_dquot+0x22a/0x2d0
> [  168.024932]  dqput.part.0+0x1c9/0x900
> [  168.025430]  __dquot_drop+0x120/0x190
> [  168.025942]  ext4_clear_inode+0x86/0x220
> [  168.026472]  ext4_evict_inode+0x9e8/0xa22
> [  168.028200]  evict+0x29e/0x4f0
> [  168.028625]  dispose_list+0x102/0x1f0
> [  168.029148]  evict_inodes+0x2c1/0x3e0
> [  168.030188]  generic_shutdown_super+0xa4/0x3b0
> [  168.030817]  kill_block_super+0x95/0xd0
> [  168.031360]  deactivate_locked_super+0x85/0xd0
> [  168.031977]  cleanup_mnt+0x2bc/0x480
> [  168.033062]  task_work_run+0xd1/0x170
> [  168.033565]  do_exit+0xa4f/0x2b50
> [  168.037155]  do_group_exit+0xef/0x2d0
> [  168.037666]  __x64_sys_exit_group+0x3a/0x50
> [  168.038237]  do_syscall_64+0x3b/0x90
> [  168.038751]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 
> In order to reproduce this problem, the following conditions need to be met:
> 1. Ext4 filesystem with no journal;
> 2. Filesystem image with incorrect quota data;
> 3. Abort filesystem forced by user;
> 4. umount filesystem;
> 
> As in ext4_quota_write:
> ...
>          if (EXT4_SB(sb)->s_journal && !handle) {
>                  ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
>                          " cancelled because transaction is not started",
>                          (unsigned long long)off, (unsigned long long)len);
>                  return -EIO;
>          }
> ...
> We only check handle if NULL when filesystem has journal. There is need
> check handle if NULL even when filesystem has no journal.
> 
> Signed-off-by: Ye Bin <yebin10@...wei.com>
> ---
>  fs/ext4/super.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 071b7b3c5678..c8ca5811ea65 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -6955,9 +6955,10 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
>  	struct buffer_head *bh;
>  	handle_t *handle = journal_current_handle();
>  
> -	if (EXT4_SB(sb)->s_journal && !handle) {
> +	if (!handle) {
>  		ext4_msg(sb, KERN_WARNING, "Quota write (off=%llu, len=%llu)"
> -			" cancelled because transaction is not started",
> +			" cancelled because transaction is not started"
> +			" or filesystem is abort forced by user",

The patch looks good, except this sentence. I don't think it has
anything to do with the abort forced by the user. Yes, in your
reproducer you induced the error by aborting the fs, but it can happen
any time we fail to start a journal transaction. So IMO the message is
already accurate enough. See ext4_release_dquot().

Thanks!
-Lukas

>  			(unsigned long long)off, (unsigned long long)len);
>  		return -EIO;
>  	}
> -- 
> 2.31.1
>