[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZjHmY6RoE3ILnsMv@dread.disaster.area>
Date: Wed, 1 May 2024 16:51:15 +1000
From: Dave Chinner <david@...morbit.com>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, tytso@....edu,
adilger.kernel@...ger.ca, jack@...e.cz, ritesh.list@...il.com,
hch@...radead.org, djwong@...nel.org, willy@...radead.org,
zokeefe@...gle.com, yi.zhang@...wei.com, chengzhihao1@...wei.com,
yukuai3@...wei.com, wangkefeng.wang@...wei.com
Subject: Re: [PATCH v4 02/34] ext4: check the extent status again before
inserting delalloc block
On Wed, Apr 10, 2024 at 10:29:16PM +0800, Zhang Yi wrote:
> From: Zhang Yi <yi.zhang@...wei.com>
>
> Now we lookup extent status entry without holding the i_data_sem before
> inserting delalloc block, it works fine in buffered write path and
> because it holds i_rwsem and folio lock, and the mmap path holds folio
> lock, so the found extent locklessly couldn't be modified concurrently.
> But it could be raced by fallocate since it allocate block whitout
> holding i_rwsem and folio lock.
>
> ext4_page_mkwrite() ext4_fallocate()
> block_page_mkwrite()
> ext4_da_map_blocks()
> //find hole in extent status tree
> ext4_alloc_file_blocks()
> ext4_map_blocks()
> //allocate block and unwritten extent
> ext4_insert_delayed_block()
> ext4_da_reserve_space()
> //reserve one more block
> ext4_es_insert_delayed_block()
> //drop unwritten extent and add delayed extent by mistake
Shouldn't this be serialised by the file invalidation lock? Hole
punching via fallocate must do this to avoid data use-after-free
bugs w.r.t racing page faults and all the other fallocate ops need
to serialise page faults to avoid page cache level data corruption.
Yet here we see a problem resulting from a fallocate operation
racing with a page fault....
Ah, I see that the invalidation lock is only picked up deep inside
ext4_punch_hole(), ext4_collapse_range(), ext4_insert_range() and
ext4_zero_range(). They all do the same flush, lock, and dio wait
preamble but each do it just a little bit differently. The allocation path does
it just a little bit differently again and does not take the
invalidate lock...
Perhaps the ext4 fallocate code should be factored so that all the
fallocate operations run the same flush, lock and wait code rather
than having 5 slightly different copies of the same code?
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists