linux-kernel - Re: [RFC PATCH] btrfs: defer freeing of subpage private state to free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260130063403.GB863940@zen.localdomain>
Date: Thu, 29 Jan 2026 22:34:03 -0800
From: Boris Burkov <boris@....io>
To: Qu Wenruo <quwenruo.btrfs@....com>
Cc: JP Kobryn <inwardvessel@...il.com>, clm@...com, dsterba@...e.com,
	linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org,
	kernel-team@...a.com,
	"linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>
Subject: Re: [RFC PATCH] btrfs: defer freeing of subpage private state to
 free_folio

On Fri, Jan 30, 2026 at 01:46:59PM +1030, Qu Wenruo wrote:
> 
> 
> 在 2026/1/30 09:38, JP Kobryn 写道:
> [...]
> > The patch also might have the advantage of being easy to backport to the
> > LTS trees. On that note, it's worth mentioning that we encountered a kernel
> > panic as a result of this sequence on a 6.16-based arm64 host (configured
> > with 64k pages so btrfs is in subpage mode). On our 6.16 kernel, the race
> > window is shown below between points A and B:
> > 
> > [mm] page cache reclaim path        [fs] relocation in subpage mode
> > shrink_folio_list()
> >    folio_trylock() /* lock acquired */
> >    filemap_release_folio()
> >      mapping->a_ops->release_folio()
> >        btrfs_release_folio()
> >          __btrfs_release_folio()
> >            clear_folio_extent_mapped()
> >              btrfs_detach_folio_state()
> >                bfs = folio_detach_private(folio)
> >                btrfs_free_folio_state(folio)
> >                  kfree(bfs) /* point A */
> > 
> >                                     prealloc_file_extent_cluster()
> >                                       filemap_lock_folio()
> 
> Mind to explain which function is calling filemap_lock_folio()?
> 
> I guess it's filemap_invalidate_inode() -> filemap_fdatawrite_range() ->
> filemap_writeback() -> btrfs_writepages() -> extent_write_cache_pages().
> 

I think you may have missed it in the diagram, and some of the function
names may have shifted a bit between kernels, but it is relocation.

On current btrfs/for-next, I think it would be:

relocate_file_extent_cluster()
  relocate_one_folio()
    filemap_lock_folio()

> >                                         folio_try_get() /* inc refcount */
> >                                         folio_lock() /* wait for lock */
> 
> 
> Another question here is, since the folio is already released in the mm
> path, the folio should not have dirty flag set.
> 
> That means inside extent_write_cache_pages(), the folio_test_dirty() should
> return false, and we should just unlock the folio without touching it
> anymore.
> 
> Mind to explain why we still continue the writeback of a non-dirty folio?
> 

I think this question is answered by the above as well: we aren't in
writeback, we are in relocation.

Thanks,
Boris

> > 
> >    __remove_mapping()
> >      if (!folio_ref_freeze(folio, refcount)) /* point B */
> >        goto cannot_free /* folio remains in cache */
> > 
> >    folio_unlock(folio) /* lock released */
> > 
> >                                     /* lock acquired */
> >                                     btrfs_subpage_clear_updodate()
> 
> Mind to provide more context of where the btrfs_subpage_clear_uptodate()
> call is from?
> 
> >                                       bfs = folio->priv /* use-after-free */
> > 
> > This exact race during relocation should not occur in the latest upstream
> > code, but it's an example of a backport opportunity for this patch.
> 
> And mind to explain what is missing in 6.16 kernel that causes the above
> use-after-free?
> 
> > 
> > Signed-off-by: JP Kobryn <inwardvessel@...il.com>
> > ---
> >   fs/btrfs/extent_io.c |  6 ++++--
> >   fs/btrfs/inode.c     | 18 ++++++++++++++++++
> >   2 files changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index 3df399dc8856..d83d3f9ae3af 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -928,8 +928,10 @@ void clear_folio_extent_mapped(struct folio *folio)
> >   		return;
> >   	fs_info = folio_to_fs_info(folio);
> > -	if (btrfs_is_subpage(fs_info, folio))
> > -		return btrfs_detach_folio_state(fs_info, folio, BTRFS_SUBPAGE_DATA);
> > +	if (btrfs_is_subpage(fs_info, folio)) {
> > +		/* freeing of private subpage data is deferred to btrfs_free_folio */
> > +		return;
> > +	}
> 
> Another question is, why only two fses (nfs for dir inode, and orangefs) are
> utilizing the free_folio() callback.
> 
> Iomap is doing the same as btrfs and only calls ifs_free() in
> release_folio() and invalidate_folio().
> 
> Thus it looks like free_folio() callback is not the recommended way to free
> folio->private pointer.
> 
> Cc fsdevel list on whether the free_folio() callback should have new
> callers.
> 
> >   	folio_detach_private(folio);
> 
> This means for regular folio cases, we still remove the private flag of such
> folio.
> 
> It may be fine for most cases as we will not touch folio->private anyway,
> but this still looks like a inconsistent behavior, especially the
> free_folio() callback has handling for both cases.
> 
> Thanks,
> Qu
> 
> >   }
> > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> > index b8abfe7439a3..7a832ee3b591 100644
> > --- a/fs/btrfs/inode.c
> > +++ b/fs/btrfs/inode.c
> > @@ -7565,6 +7565,23 @@ static bool btrfs_release_folio(struct folio *folio, gfp_t gfp_flags)
> >   	return __btrfs_release_folio(folio, gfp_flags);
> >   }
> > +/* frees subpage private data if present */
> > +static void btrfs_free_folio(struct folio *folio)
> > +{
> > +	struct btrfs_folio_state *bfs;
> > +
> > +	if (!folio_test_private(folio))
> > +		return;
> > +
> > +	bfs = folio_detach_private(folio);
> > +	if (bfs == (void *)EXTENT_FOLIO_PRIVATE) {
> > +		/* extent map flag is detached in btrfs_folio_release */
> > +		return;
> > +	}
> > +
> > +	btrfs_free_folio_state(bfs);
> > +}
> > +
> >   #ifdef CONFIG_MIGRATION
> >   static int btrfs_migrate_folio(struct address_space *mapping,
> >   			     struct folio *dst, struct folio *src,
> > @@ -10651,6 +10668,7 @@ static const struct address_space_operations btrfs_aops = {
> >   	.invalidate_folio = btrfs_invalidate_folio,
> >   	.launder_folio	= btrfs_launder_folio,
> >   	.release_folio	= btrfs_release_folio,
> > +	.free_folio = btrfs_free_folio,
> >   	.migrate_folio	= btrfs_migrate_folio,
> >   	.dirty_folio	= filemap_dirty_folio,
> >   	.error_remove_folio = generic_error_remove_folio,
>