[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251202005146.3723457-1-loemra.dev@gmail.com>
Date: Mon, 1 Dec 2025 16:51:41 -0800
From: Leo Martins <loemra.dev@...il.com>
To: kernel test robot <oliver.sang@...el.com>
Cc: oe-lkp@...ts.linux.dev,
lkp@...el.com,
linux-kernel@...r.kernel.org,
David Sterba <dsterba@...e.com>,
linux-btrfs@...r.kernel.org
Subject: Re: [linus:master] [btrfs] e8513c012d: addition_on#;use-after-free
On Wed, 26 Nov 2025 22:23:21 +0800 kernel test robot <oliver.sang@...el.com> wrote:
>
>
> Hello,
>
> kernel test robot noticed "addition_on#;use-after-free" on:
>
> commit: e8513c012de75fd65e2df5499572bc6ef3f6e409 ("btrfs: implement ref_tracker for delayed_nodes")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> [test failed on linus/master 30f09200cc4aefbd8385b01e41bde2e4565a6f0e]
> [test failed on linux-next/master 92fd6e84175befa1775e5c0ab682938eca27c0b2]
>
> in testcase: blogbench
> version: blogbench-x86_64-1.2-1_20251009
> with following parameters:
>
> disk: 1SSD
> fs: btrfs
> cpufreq_governor: performance
>
>
>
> config: x86_64-rhel-9.4
> compiler: gcc-14
> test machine: 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 176G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@...el.com>
> | Closes: https://lore.kernel.org/oe-lkp/202511262228.6dda231e-lkp@intel.com
>
>
> [ 92.359770][ T1319] ------------[ cut here ]------------
> [ 92.365399][ T2115] 1513 283338 42318 194834 39930 140043 83712
> [ 92.365683][ T1319] refcount_t: addition on 0; use-after-free.
> [ 92.371060][ T2115]
> [ 92.381336][ T1319] WARNING: CPU: 29 PID: 1319 at lib/refcount.c:25 refcount_warn_saturate (lib/refcount.c:25 (discriminator 1))
> [ 92.389049][ T2115] 2174 291211 39936 202785 36246 141919 76774
> [ 92.389356][ T1319] Modules linked in:
> [ 92.398571][ T2115]
> [ 92.419599][ T1319] dm_mod intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac skx_edac_common nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp btrfs kvm_intel blake2b_generic xor kvm raid6_pq snd_pcm irqbypass ipmi_ssif snd_timer ghash_clmulni_intel ahci ast rapl snd intel_cstate libahci acpi_power_meter nvme drm_client_lib binfmt_misc drm_shmem_helper mei_me soundcore intel_uncore ioatdma ipmi_si i2c_i801 pcspkr nvme_core acpi_ipmi libata mei drm_kms_helper i2c_smbus lpc_ich intel_pch_thermal dca ipmi_devintf wmi ipmi_msghandler acpi_pad joydev drm fuse nfnetlink
> [ 92.477068][ T1319] CPU: 29 UID: 0 PID: 1319 Comm: kworker/u770:33 Tainted: G S 6.17.0-rc7-00022-ge8513c012de7 #1 VOLUNTARY
> [ 92.490808][ T1319] Tainted: [S]=CPU_OUT_OF_SPEC
> [ 92.495958][ T1319] Hardware name: Intel Corporation ............/S9200WKBRD2, BIOS SE5C620.86B.0D.01.0552.060220191912 06/02/2019
> [ 92.508765][ T1319] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
> [ 92.516545][ T1319] RIP: 0010:refcount_warn_saturate (lib/refcount.c:25 (discriminator 1))
> [ 92.523057][ T1319] Code: 95 95 ff 0f 0b e9 4b 45 90 00 80 3d d3 40 98 01 00 0f 85 5e ff ff ff 48 c7 c7 88 fe ad 82 c6 05 bf 40 98 01 01 e8 9b 95 95 ff <0f> 0b c3 cc cc cc cc 48 c7 c7 e0 fe ad 82 c6 05 a3 40 98 01 01 e8
> All code
> ========
> 0: 95 xchg %eax,%ebp
> 1: 95 xchg %eax,%ebp
> 2: ff 0f decl (%rdi)
> 4: 0b e9 or %ecx,%ebp
> 6: 4b rex.WXB
> 7: 45 90 rex.RB xchg %eax,%r8d
> 9: 00 80 3d d3 40 98 add %al,-0x67bf2cc3(%rax)
> f: 01 00 add %eax,(%rax)
> 11: 0f 85 5e ff ff ff jne 0xffffffffffffff75
> 17: 48 c7 c7 88 fe ad 82 mov $0xffffffff82adfe88,%rdi
> 1e: c6 05 bf 40 98 01 01 movb $0x1,0x19840bf(%rip) # 0x19840e4
> 25: e8 9b 95 95 ff call 0xffffffffff9595c5
> 2a:* 0f 0b ud2 <-- trapping instruction
> 2c: c3 ret
> 2d: cc int3
> 2e: cc int3
> 2f: cc int3
> 30: cc int3
> 31: 48 c7 c7 e0 fe ad 82 mov $0xffffffff82adfee0,%rdi
> 38: c6 05 a3 40 98 01 01 movb $0x1,0x19840a3(%rip) # 0x19840e2
> 3f: e8 .byte 0xe8
>
> Code starting with the faulting instruction
> ===========================================
> 0: 0f 0b ud2
> 2: c3 ret
> 3: cc int3
> 4: cc int3
> 5: cc int3
> 6: cc int3
> 7: 48 c7 c7 e0 fe ad 82 mov $0xffffffff82adfee0,%rdi
> e: c6 05 a3 40 98 01 01 movb $0x1,0x19840a3(%rip) # 0x19840b8
> 15: e8 .byte 0xe8
> [ 92.543676][ T1319] RSP: 0018:ffffc9000fcabca0 EFLAGS: 00010282
> [ 92.550187][ T1319] RAX: 0000000000000000 RBX: ffff888e3d2c8d68 RCX: 0000000000000000
> [ 92.558625][ T1319] RDX: ffff88984f36a3c0 RSI: 0000000000000001 RDI: ffff88984f35c200
> [ 92.567066][ T1319] RBP: ffff8881069c6368 R08: 00000000000008fd R09: 000000000000001d
> [ 92.575508][ T1319] R10: 5b5d393933353633 R11: 205d353131325420 R12: ffff888cd91ced20
> [ 92.583960][ T1319] R13: 0000000000081087 R14: ffff888ed98c2ba8 R15: ffff8881069c6000
> [ 92.592397][ T1319] FS: 0000000000000000(0000) GS:ffff8898cb53c000(0000) knlGS:0000000000000000
> [ 92.601843][ T1319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 92.608920][ T1319] CR2: 0000000000000800 CR3: 0000002c7de24001 CR4: 00000000007726f0
> [ 92.617398][ T1319] PKRU: 55555554
> [ 92.621442][ T1319] Call Trace:
> [ 92.625216][ T1319] <TASK>
> [ 92.628645][ T1319] btrfs_get_delayed_node+0xda/0x1b0 btrfs
> [ 92.636027][ T1319] btrfs_get_or_create_delayed_node+0x12a/0x1b0 btrfs
> [ 92.644356][ T1319] btrfs_delayed_update_inode (fs/btrfs/delayed-inode.c:1957) btrfs
> [ 92.651387][ T1319] ? btrfs_update_root_times (fs/btrfs/root-tree.c:488) btrfs
> [ 92.658201][ T1319] btrfs_update_inode (fs/btrfs/inode.c:4174) btrfs
> [ 92.664431][ T1319] btrfs_finish_one_ordered (fs/btrfs/inode.c:4188 fs/btrfs/inode.c:3226) btrfs
> [ 92.671351][ T1319] btrfs_work_helper (fs/btrfs/async-thread.c:313) btrfs
> [ 92.677606][ T1319] process_one_work (arch/x86/include/asm/jump_label.h:36 include/trace/events/workqueue.h:110 kernel/workqueue.c:3241)
> [ 92.682972][ T1319] worker_thread (kernel/workqueue.c:3313 (discriminator 2) kernel/workqueue.c:3400 (discriminator 2))
> [ 92.688065][ T1319] ? __pfx_worker_thread (kernel/workqueue.c:3346)
> [ 92.693679][ T1319] kthread (kernel/kthread.c:463)
> [ 92.698145][ T1319] ? __pfx_kthread (kernel/kthread.c:412)
> [ 92.703205][ T1319] ret_from_fork (arch/x86/kernel/process.c:148)
> [ 92.708102][ T1319] ? __pfx_kthread (kernel/kthread.c:412)
> [ 92.713169][ T1319] ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
> [ 92.718420][ T1319] </TASK>
> [ 92.721904][ T1319] ---[ end trace 0000000000000000 ]---
> [ 92.772617][ T2117] [ perf record: Woken up 232 times to write data ]
> [ 92.772626][ T2117]
> [ 97.041836][ T2115] 2696 307116 30435 210930 28018 143529 49252
> [ 97.041848][ T2115]
> [ 107.042097][ T2115] 3311 309408 37927 214378 34672 141593 71871
> [ 107.042111][ T2115]
> [ 107.492140][ T2117] Warning:
> [ 107.492150][ T2117]
> [ 107.504083][ T2117] 174 out of order events recorded.
> [ 107.504089][ T2117]
> [ 107.572544][ T2117] [ perf record: Captured and wrote 680.421 MB /tmp/lkp/perf-sched.data (2685986 samples) ]
> [ 107.572552][ T2117]
> [ 116.735824][ C3] perf: interrupt took too long (2518 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
> [ 116.747160][ C3] perf: interrupt took too long (3163 > 3147), lowering kernel.perf_event_max_sample_rate to 63000
> [ 116.760698][ T2117] Events enabled
> [ 116.760708][ T2117]
> [ 116.763935][ C3] perf: interrupt took too long (3955 > 3953), lowering kernel.perf_event_max_sample_rate to 50000
> [ 116.796549][ C67] perf: interrupt took too long (4953 > 4943), lowering kernel.perf_event_max_sample_rate to 40000
> [ 116.866088][ C131] perf: interrupt took too long (6210 > 6191), lowering kernel.perf_event_max_sample_rate to 32000
> [ 117.042049][ T2115] 3836 310809 30767 213835 28646 137339 48846
> [ 117.042057][ T2115]
> [ 117.739156][ C25] perf: interrupt took too long (7763 > 7762), lowering kernel.perf_event_max_sample_rate to 25000
> [ 119.838008][ T2117] [ perf record: Woken up 399 times to write data ]
> [ 119.838018][ T2117]
> [ 124.613970][ T2117] Warning:
> [ 124.613980][ T2117]
> [ 124.625324][ T2117] Processed 1601669 events and lost 6 chunks!
> [ 124.625332][ T2117]
> [ 124.635344][ T2117]
> [ 124.635349][ T2117]
> [ 124.642513][ T2117] Check IO/CPU overload!
> [ 124.642518][ T2117]
> [ 124.649994][ T2117]
> [ 124.649999][ T2117]
> [ 124.656077][ T2117] Warning:
> [ 124.656082][ T2117]
> [ 124.664342][ T2117] 35 out of order events recorded.
> [ 124.664347][ T2117]
> [ 124.678444][ T2117] [ perf record: Captured and wrote 252.317 MB /tmp/lkp/perf_c2c.data (1412807 samples) ]
> [ 124.678450][ T2117]
> [ 127.042332][ T2115] 4407 303071 35022 208647 31346 136630 47202
> [ 127.042344][ T2115]
> [ 137.042250][ T2115] 4943 299584 31810 208626 30129 128390 56528
> [ 137.042263][ T2115]
> [ 147.042441][ T2115] 5385 300009 27874 208644 24778 132677 46221
> [ 147.042453][ T2115]
> [ 157.042433][ T2115] 5910 307972 30431 216088 28665 137389 57696
> [ 157.042445][ T2115]
> [ 167.042628][ T2115] 6460 304663 35022 213949 31758 139786 62225
> [ 167.042640][ T2115]
> [ 177.042573][ T2115] 6967 307323 30606 213465 26983 141516 63168
> [ 177.042587][ T2115]
> [ 187.042733][ T2115] 7407 307075 26787 213546 24994 142923 41523
> [ 187.042745][ T2115]
> [ 197.042791][ T2115] 7927 308105 34190 214356 29643 140592 52914
> [ 197.042803][ T2115]
> [ 207.042918][ T2115] 8549 301948 38038 210474 34706 141120 67811
> [ 207.042931][ T2115]
> [ 217.042636][ T2115] 8955 297050 25135 208501 22518 135881 44260
> [ 217.042648][ T2115]
> [ 227.038865][ T2115] 9471 304296 32936 210610 29938 137844 62809
> [ 227.038870][ T2115]
> [ 237.038945][ T2115] 9964 305929 31628 213754 28012 141671 62429
> [ 237.038951][ T2115]
> [ 247.039008][ T2115] 10440 307726 30685 214830 28062 140111 62249
> [ 247.039015][ T2115]
> [ 257.039183][ T2115] 10802 292789 23345 205882 20689 138434 48453
> [ 257.039188][ T2115]
> [ 267.043497][ T2115] 11273 306881 30725 212403 27270 143693 60639
> [ 267.043508][ T2115]
> [ 277.043510][ T2115] 11797 302806 33279 211710 30071 142609 64344
> [ 277.043519][ T2115]
> [ 287.039432][ T2115] 12205 264441 24852 186434 24016 129256 56311
> [ 287.039437][ T2115]
> [ 297.043412][ T2115] 12673 310406 27331 217050 25960 146991 54571
> [ 297.043421][ T2115]
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20251126/202511262228.6dda231e-lkp@intel.com
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
Hello,
I believe I have identified the root cause of the warning.
However, I'm having some troubles running the reproducer as I
haven't setup lkp-tests yet. Could you test the patch below
against your reproducer to see if it fixes the issue?
---8<---
[PATCH] btrfs: fix use-after-free in btrfs_get_or_create_delayed_node
Previously, btrfs_get_or_create_delayed_node sets the delayed_node's
refcount before acquiring the root->delayed_nodes lock.
Commit e8513c012de7 ("btrfs: implement ref_tracker for delayed_nodes")
moves refcount_set inside the critical section which means
there is no longer a memory barrier between setting the refcount and
setting btrfs_inode->delayed_node = node.
This allows btrfs_get_or_create_delayed_node to set
btrfs_inode->delayed_node before setting the refcount.
A different thread is then able to read and increase the refcount
of btrfs_inode->delayed_node leading to a refcounting bug and
a use-after-free warning.
The fix is to move refcount_set back to where it was to take
advantage of the implicit memory barrier provided by lock
acquisition.
Fixes: e8513c012de7 ("btrfs: implement ref_tracker for delayed_nodes")
Reported-by: kernel test robot <oliver.sang@...el.com>
Closes: https://lore.kernel.org/oe-lkp/202511262228.6dda231e-lkp@intel.com
Signed-off-by: Leo Martins <loemra.dev@...il.com>
---
fs/btrfs/delayed-inode.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index 364814642a91..f61f10000e33 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -152,37 +152,39 @@ static struct btrfs_delayed_node *btrfs_get_or_create_delayed_node(
return ERR_PTR(-ENOMEM);
btrfs_init_delayed_node(node, root, ino);
+ /* Cached in the inode and can be accessed. */
+ refcount_set(&node->refs, 2);
+ btrfs_delayed_node_ref_tracker_alloc(node, tracker, GFP_ATOMIC);
+ btrfs_delayed_node_ref_tracker_alloc(node, &node->inode_cache_tracker, GFP_ATOMIC);
+
/* Allocate and reserve the slot, from now it can return a NULL from xa_load(). */
ret = xa_reserve(&root->delayed_nodes, ino, GFP_NOFS);
- if (ret == -ENOMEM) {
- btrfs_delayed_node_ref_tracker_dir_exit(node);
- kmem_cache_free(delayed_node_cache, node);
- return ERR_PTR(-ENOMEM);
- }
+ if (ret == -ENOMEM)
+ goto cleanup;
+
xa_lock(&root->delayed_nodes);
ptr = xa_load(&root->delayed_nodes, ino);
if (ptr) {
/* Somebody inserted it, go back and read it. */
xa_unlock(&root->delayed_nodes);
- btrfs_delayed_node_ref_tracker_dir_exit(node);
- kmem_cache_free(delayed_node_cache, node);
- node = NULL;
- goto again;
+ goto cleanup;
}
ptr = __xa_store(&root->delayed_nodes, ino, node, GFP_ATOMIC);
ASSERT(xa_err(ptr) != -EINVAL);
ASSERT(xa_err(ptr) != -ENOMEM);
ASSERT(ptr == NULL);
-
- /* Cached in the inode and can be accessed. */
- refcount_set(&node->refs, 2);
- btrfs_delayed_node_ref_tracker_alloc(node, tracker, GFP_ATOMIC);
- btrfs_delayed_node_ref_tracker_alloc(node, &node->inode_cache_tracker, GFP_ATOMIC);
-
- btrfs_inode->delayed_node = node;
+ WRITE_ONCE(btrfs_inode->delayed_node, node);
xa_unlock(&root->delayed_nodes);
return node;
+cleanup:
+ btrfs_delayed_node_ref_tracker_free(node, tracker);
+ btrfs_delayed_node_ref_tracker_free(node, &node->inode_cache_tracker);
+ btrfs_delayed_node_ref_tracker_dir_exit(node);
+ kmem_cache_free(delayed_node_cache, node);
+ if (ret)
+ return ERR_PTR(ret);
+ goto again;
}
/*
--
2.47.3
Powered by blists - more mailing lists