lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230525103010.jowdiecabckx5cwq@quack3>
Date:   Thu, 25 May 2023 12:30:10 +0200
From:   Jan Kara <jack@...e.cz>
To:     Baokun Li <libaokun1@...wei.com>
Cc:     linux-ext4@...r.kernel.org, tytso@....edu,
        adilger.kernel@...ger.ca, jack@...e.cz, ritesh.list@...il.com,
        linux-kernel@...r.kernel.org, yi.zhang@...wei.com,
        yangerkun@...wei.com, yukuai3@...wei.com, stable@...r.kernel.org
Subject: Re: [PATCH v4] ext4: fix race between writepages and remount

On Wed 24-05-23 15:25:38, Baokun Li wrote:
> We got a WARNING in ext4_add_complete_io:
> ==================================================================
>  WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250
>  CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 #85
>  RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4]
>  [...]
>  Call Trace:
>   <TASK>
>   ext4_end_bio+0xa8/0x240 [ext4]
>   bio_endio+0x195/0x310
>   blk_update_request+0x184/0x770
>   scsi_end_request+0x2f/0x240
>   scsi_io_completion+0x75/0x450
>   scsi_finish_command+0xef/0x160
>   scsi_complete+0xa3/0x180
>   blk_complete_reqs+0x60/0x80
>   blk_done_softirq+0x25/0x40
>   __do_softirq+0x119/0x4c8
>   run_ksoftirqd+0x42/0x70
>   smpboot_thread_fn+0x136/0x3c0
>   kthread+0x140/0x1a0
>   ret_from_fork+0x2c/0x50
> ==================================================================
> 
> Above issue may happen as follows:
> 
>             cpu1                        cpu2
> ----------------------------|----------------------------
> mount -o dioread_lock
> ext4_writepages
>  ext4_do_writepages
>   *if (ext4_should_dioread_nolock(inode))*
>     // rsv_blocks is not assigned here
>                                  mount -o remount,dioread_nolock
>   ext4_journal_start_with_reserve
>    __ext4_journal_start
>     __ext4_journal_start_sb
>      jbd2__journal_start
>       *if (rsv_blocks)*
>         // h_rsv_handle is not initialized here
>   mpage_map_and_submit_extent
>     mpage_map_one_extent
>       dioread_nolock = ext4_should_dioread_nolock(inode)
>       if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN))
>         mpd->io_submit.io_end->handle = handle->h_rsv_handle
>         ext4_set_io_unwritten_flag
>           io_end->flag |= EXT4_IO_END_UNWRITTEN
>       // now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag
> 
> scsi_finish_command
>  scsi_io_completion
>   scsi_io_completion_action
>    scsi_end_request
>     blk_update_request
>      req_bio_endio
>       bio_endio
>        bio->bi_end_io  > ext4_end_bio
>         ext4_put_io_end_defer
> 	 ext4_add_complete_io
> 	  // trigger WARN_ON(!io_end->handle && sbi->s_journal);
> 
> The immediate cause of this problem is that ext4_should_dioread_nolock()
> function returns inconsistent values in the ext4_do_writepages() and
> mpage_map_one_extent(). There are four conditions in this function that
> can be changed at mount time to cause this problem. These four conditions
> can be divided into two categories:
> 
>     (1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl
>     (2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount
> 
> The two in the first category have been fixed by commit c8585c6fcaf2
> ("ext4: fix races between changing inode journal mode and ext4_writepages")
> and commit cb85f4d23f79 ("ext4: fix race between writepages and enabling
> EXT4_EXTENTS_FL") respectively.
> 
> Two cases in the other category have not yet been fixed, and the above
> issue is caused by this situation. We refer to the fix for the first
> category, when applying options during remount, we grab s_writepages_rwsem
> to avoid racing with writepages ops to trigger this problem.
> 
> Fixes: 6b523df4fb5a ("ext4: use transaction reservation for extent conversion in ext4_end_io")
> Cc: stable@...r.kernel.org
> Signed-off-by: Baokun Li <libaokun1@...wei.com>

Thanks for the fix! It looks good to me. Feel free to add:

Reviewed-by: Jan Kara <jack@...e.cz>

								Honza

> ---
> V1->V2:
> 	Grab s_writepages_rwsem unconditionally during remount.
> 	Remove patches 1,2 that are no longer needed.
> V2->V3:
> 	Also grab s_writepages_rwsem when restoring options.
> V3->V4:
> 	Rebased on top of mainline.
> 	Reference 00d873c17e29 ("ext4: avoid deadlock in fs reclaim with
> 	page writeback") to use s_writepages_rwsem.
> 
>  fs/ext4/ext4.h  |  3 ++-
>  fs/ext4/super.c | 14 ++++++++++++++
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 6948d673bba2..97ef99c7f296 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -1613,7 +1613,8 @@ struct ext4_sb_info {
>  
>  	/*
>  	 * Barrier between writepages ops and changing any inode's JOURNAL_DATA
> -	 * or EXTENTS flag.
> +	 * or EXTENTS flag or between writepages ops and changing DELALLOC or
> +	 * DIOREAD_NOLOCK mount options on remount.
>  	 */
>  	struct percpu_rw_semaphore s_writepages_rwsem;
>  	struct dax_device *s_daxdev;
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 9680fe753e59..fff42682e4e0 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -6389,6 +6389,7 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
>  	ext4_group_t g;
>  	int err = 0;
>  	int enable_rw = 0;
> +	int alloc_ctx;
>  #ifdef CONFIG_QUOTA
>  	int enable_quota = 0;
>  	int i, j;
> @@ -6429,7 +6430,16 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
>  
>  	}
>  
> +	/*
> +	 * Changing the DIOREAD_NOLOCK or DELALLOC mount options may cause
> +	 * two calls to ext4_should_dioread_nolock() to return inconsistent
> +	 * values, triggering WARN_ON in ext4_add_complete_io(). we grab
> +	 * here s_writepages_rwsem to avoid race between writepages ops and
> +	 * remount.
> +	 */
> +	alloc_ctx = ext4_writepages_down_write(sb);
>  	ext4_apply_options(fc, sb);
> +	ext4_writepages_up_write(sb, alloc_ctx);
>  
>  	if ((old_opts.s_mount_opt & EXT4_MOUNT_JOURNAL_CHECKSUM) ^
>  	    test_opt(sb, JOURNAL_CHECKSUM)) {
> @@ -6650,6 +6660,8 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
>  	if ((sb->s_flags & SB_RDONLY) && !(old_sb_flags & SB_RDONLY) &&
>  	    sb_any_quota_suspended(sb))
>  		dquot_resume(sb, -1);
> +
> +	alloc_ctx = ext4_writepages_down_write(sb);
>  	sb->s_flags = old_sb_flags;
>  	sbi->s_mount_opt = old_opts.s_mount_opt;
>  	sbi->s_mount_opt2 = old_opts.s_mount_opt2;
> @@ -6658,6 +6670,8 @@ static int __ext4_remount(struct fs_context *fc, struct super_block *sb)
>  	sbi->s_commit_interval = old_opts.s_commit_interval;
>  	sbi->s_min_batch_time = old_opts.s_min_batch_time;
>  	sbi->s_max_batch_time = old_opts.s_max_batch_time;
> +	ext4_writepages_up_write(sb, alloc_ctx);
> +
>  	if (!test_opt(sb, BLOCK_VALIDITY) && sbi->s_system_blks)
>  		ext4_release_system_zone(sb);
>  #ifdef CONFIG_QUOTA
> -- 
> 2.31.1
> 
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ