linux-kernel - Re: [RFC PATCH] btrfs: defer freeing of subpage private state to free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <776e54f6-c9b7-4b22-bde5-561dc65c9be7@gmx.com>
Date: Fri, 30 Jan 2026 13:46:59 +1030
From: Qu Wenruo <quwenruo.btrfs@....com>
To: JP Kobryn <inwardvessel@...il.com>, boris@....io, clm@...com,
 dsterba@...e.com
Cc: linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org,
 kernel-team@...a.com,
 "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>
Subject: Re: [RFC PATCH] btrfs: defer freeing of subpage private state to
 free_folio



在 2026/1/30 09:38, JP Kobryn 写道:
[...]
> The patch also might have the advantage of being easy to backport to the
> LTS trees. On that note, it's worth mentioning that we encountered a kernel
> panic as a result of this sequence on a 6.16-based arm64 host (configured
> with 64k pages so btrfs is in subpage mode). On our 6.16 kernel, the race
> window is shown below between points A and B:
> 
> [mm] page cache reclaim path        [fs] relocation in subpage mode
> shrink_folio_list()
>    folio_trylock() /* lock acquired */
>    filemap_release_folio()
>      mapping->a_ops->release_folio()
>        btrfs_release_folio()
>          __btrfs_release_folio()
>            clear_folio_extent_mapped()
>              btrfs_detach_folio_state()
>                bfs = folio_detach_private(folio)
>                btrfs_free_folio_state(folio)
>                  kfree(bfs) /* point A */
> 
>                                     prealloc_file_extent_cluster()
>                                       filemap_lock_folio()

Mind to explain which function is calling filemap_lock_folio()?

I guess it's filemap_invalidate_inode() -> filemap_fdatawrite_range() -> 
filemap_writeback() -> btrfs_writepages() -> extent_write_cache_pages().

>                                         folio_try_get() /* inc refcount */
>                                         folio_lock() /* wait for lock */


Another question here is, since the folio is already released in the mm 
path, the folio should not have dirty flag set.

That means inside extent_write_cache_pages(), the folio_test_dirty() 
should return false, and we should just unlock the folio without 
touching it anymore.

Mind to explain why we still continue the writeback of a non-dirty folio?

> 
>    __remove_mapping()
>      if (!folio_ref_freeze(folio, refcount)) /* point B */
>        goto cannot_free /* folio remains in cache */
> 
>    folio_unlock(folio) /* lock released */
> 
>                                     /* lock acquired */
>                                     btrfs_subpage_clear_updodate()

Mind to provide more context of where the btrfs_subpage_clear_uptodate() 
call is from?

>                                       bfs = folio->priv /* use-after-free */
> 
> This exact race during relocation should not occur in the latest upstream
> code, but it's an example of a backport opportunity for this patch.

And mind to explain what is missing in 6.16 kernel that causes the above 
use-after-free?

> 
> Signed-off-by: JP Kobryn <inwardvessel@...il.com>
> ---
>   fs/btrfs/extent_io.c |  6 ++++--
>   fs/btrfs/inode.c     | 18 ++++++++++++++++++
>   2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 3df399dc8856..d83d3f9ae3af 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -928,8 +928,10 @@ void clear_folio_extent_mapped(struct folio *folio)
>   		return;
>   
>   	fs_info = folio_to_fs_info(folio);
> -	if (btrfs_is_subpage(fs_info, folio))
> -		return btrfs_detach_folio_state(fs_info, folio, BTRFS_SUBPAGE_DATA);
> +	if (btrfs_is_subpage(fs_info, folio)) {
> +		/* freeing of private subpage data is deferred to btrfs_free_folio */
> +		return;
> +	}

Another question is, why only two fses (nfs for dir inode, and orangefs) 
are utilizing the free_folio() callback.

Iomap is doing the same as btrfs and only calls ifs_free() in 
release_folio() and invalidate_folio().

Thus it looks like free_folio() callback is not the recommended way to 
free folio->private pointer.

Cc fsdevel list on whether the free_folio() callback should have new 
callers.

>   
>   	folio_detach_private(folio);

This means for regular folio cases, we still remove the private flag of 
such folio.

It may be fine for most cases as we will not touch folio->private 
anyway, but this still looks like a inconsistent behavior, especially 
the free_folio() callback has handling for both cases.

Thanks,
Qu

>   }
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index b8abfe7439a3..7a832ee3b591 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -7565,6 +7565,23 @@ static bool btrfs_release_folio(struct folio *folio, gfp_t gfp_flags)
>   	return __btrfs_release_folio(folio, gfp_flags);
>   }
>   
> +/* frees subpage private data if present */
> +static void btrfs_free_folio(struct folio *folio)
> +{
> +	struct btrfs_folio_state *bfs;
> +
> +	if (!folio_test_private(folio))
> +		return;
> +
> +	bfs = folio_detach_private(folio);
> +	if (bfs == (void *)EXTENT_FOLIO_PRIVATE) {
> +		/* extent map flag is detached in btrfs_folio_release */
> +		return;
> +	}
> +
> +	btrfs_free_folio_state(bfs);
> +}
> +
>   #ifdef CONFIG_MIGRATION
>   static int btrfs_migrate_folio(struct address_space *mapping,
>   			     struct folio *dst, struct folio *src,
> @@ -10651,6 +10668,7 @@ static const struct address_space_operations btrfs_aops = {
>   	.invalidate_folio = btrfs_invalidate_folio,
>   	.launder_folio	= btrfs_launder_folio,
>   	.release_folio	= btrfs_release_folio,
> +	.free_folio = btrfs_free_folio,
>   	.migrate_folio	= btrfs_migrate_folio,
>   	.dirty_folio	= filemap_dirty_folio,
>   	.error_remove_folio = generic_error_remove_folio,