lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20260129230822.168034-1-inwardvessel@gmail.com>
Date: Thu, 29 Jan 2026 15:08:22 -0800
From: JP Kobryn <inwardvessel@...il.com>
To: boris@....io,
	clm@...com,
	dsterba@...e.com
Cc: linux-btrfs@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	kernel-team@...a.com
Subject: [RFC PATCH] btrfs: defer freeing of subpage private state to free_folio

During reclaim, file-mapped folio callbacks can be invoked through the
address_space_operations "aops" interface. In one specific case involving
btrfs, the release_folio() callback is made and the btrfs-specific private
data is freed. Afterward, continuing in the reclaim path, the folio will be
freed from the page cache if its refcount has reached zero. Because there
is a window between the freeing of the private data and the refcount check,
it is possible for another task to increment the refcount in parallel and
the folio will remain in the page cache with NULL private data. The other
task then acquires the folio and is forced to take precautions on accessing
the private field.

There is existing code that is aware of this. In some of these places, a
NULL check is performed on the private field and if not present it gets
recreated. There are surrounding comments referring to the race of freeing
the private data. For example:

/*
 * We unlock the page after the io is completed and then re-lock it
 * above.  release_folio() could have come in between that and cleared
 * folio private, but left the page in the mapping.  Set the page mapped
 * here to make sure it's properly set for the subpage stuff.
 */
ret = set_folio_extent_mapped(folio);

It's worth noting in advance that the protections currently in place and
also the points in which btrfs invokes filemap_invalidate_inode() may be
sufficient. The purpose of this patch though, is to ensure the btrfs
private subpage metadata lives as long as its folio, which may help avoid
the loss of subpage metadata and improve maintainability (by preventing any
possible use after free in present/future code). Currently the private data
is freed in the btrfs-specific aops callback release_folio(), but this
proposed change instead defers the freeing until the aops free_folio()
callback.

The patch also might have the advantage of being easy to backport to the
LTS trees. On that note, it's worth mentioning that we encountered a kernel
panic as a result of this sequence on a 6.16-based arm64 host (configured
with 64k pages so btrfs is in subpage mode). On our 6.16 kernel, the race
window is shown below between points A and B:

[mm] page cache reclaim path        [fs] relocation in subpage mode
shrink_folio_list()
  folio_trylock() /* lock acquired */
  filemap_release_folio()
    mapping->a_ops->release_folio()
      btrfs_release_folio()
        __btrfs_release_folio()
          clear_folio_extent_mapped()
            btrfs_detach_folio_state()
              bfs = folio_detach_private(folio)
              btrfs_free_folio_state(folio)
                kfree(bfs) /* point A */

                                   prealloc_file_extent_cluster()
                                     filemap_lock_folio()
                                       folio_try_get() /* inc refcount */
                                       folio_lock() /* wait for lock */

  __remove_mapping()
    if (!folio_ref_freeze(folio, refcount)) /* point B */
      goto cannot_free /* folio remains in cache */

  folio_unlock(folio) /* lock released */

                                   /* lock acquired */
                                   btrfs_subpage_clear_updodate()
                                     bfs = folio->priv /* use-after-free */

This exact race during relocation should not occur in the latest upstream
code, but it's an example of a backport opportunity for this patch.

Signed-off-by: JP Kobryn <inwardvessel@...il.com>
---
 fs/btrfs/extent_io.c |  6 ++++--
 fs/btrfs/inode.c     | 18 ++++++++++++++++++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3df399dc8856..d83d3f9ae3af 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -928,8 +928,10 @@ void clear_folio_extent_mapped(struct folio *folio)
 		return;
 
 	fs_info = folio_to_fs_info(folio);
-	if (btrfs_is_subpage(fs_info, folio))
-		return btrfs_detach_folio_state(fs_info, folio, BTRFS_SUBPAGE_DATA);
+	if (btrfs_is_subpage(fs_info, folio)) {
+		/* freeing of private subpage data is deferred to btrfs_free_folio */
+		return;
+	}
 
 	folio_detach_private(folio);
 }
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b8abfe7439a3..7a832ee3b591 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -7565,6 +7565,23 @@ static bool btrfs_release_folio(struct folio *folio, gfp_t gfp_flags)
 	return __btrfs_release_folio(folio, gfp_flags);
 }
 
+/* frees subpage private data if present */
+static void btrfs_free_folio(struct folio *folio)
+{
+	struct btrfs_folio_state *bfs;
+
+	if (!folio_test_private(folio))
+		return;
+
+	bfs = folio_detach_private(folio);
+	if (bfs == (void *)EXTENT_FOLIO_PRIVATE) {
+		/* extent map flag is detached in btrfs_folio_release */
+		return;
+	}
+
+	btrfs_free_folio_state(bfs);
+}
+
 #ifdef CONFIG_MIGRATION
 static int btrfs_migrate_folio(struct address_space *mapping,
 			     struct folio *dst, struct folio *src,
@@ -10651,6 +10668,7 @@ static const struct address_space_operations btrfs_aops = {
 	.invalidate_folio = btrfs_invalidate_folio,
 	.launder_folio	= btrfs_launder_folio,
 	.release_folio	= btrfs_release_folio,
+	.free_folio = btrfs_free_folio,
 	.migrate_folio	= btrfs_migrate_folio,
 	.dirty_folio	= filemap_dirty_folio,
 	.error_remove_folio = generic_error_remove_folio,
-- 
2.47.3


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ