linux-kernel - Re: [syzbot] [xfs?] KASAN: slab-use-after-free Write in xlog_cil

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aK0MwSIIv4Ca-RXw@dread.disaster.area>
Date: Tue, 26 Aug 2025 11:24:17 +1000
From: Dave Chinner <david@...morbit.com>
To: syzbot <syzbot+4e6ee73c0ae4b6e8753f@...kaller.appspotmail.com>
Cc: cem@...nel.org, cmaiolino@...hat.com, dchinner@...hat.com,
	djwong@...nel.org, linux-kernel@...r.kernel.org,
	linux-xfs@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [xfs?] KASAN: slab-use-after-free Write in
 xlog_cil_committed

On Thu, Aug 21, 2025 at 05:20:37AM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    931e46dcbc7e Add linux-next specific files for 20250814
> git tree:       linux-next
> console+strace: https://syzkaller.appspot.com/x/log.txt?x=14621234580000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=a286bd75352e92fa
> dashboard link: https://syzkaller.appspot.com/bug?extid=4e6ee73c0ae4b6e8753f
> compiler:       Debian clang version 20.1.7 (++20250616065708+6146a88f6049-1~exp1~20250616065826.132), Debian LLD 20.1.7
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=1128faf0580000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12621234580000
> 
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/fb896162d550/disk-931e46dc.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/45f6f857b82c/vmlinux-931e46dc.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/0f16e70143e1/bzImage-931e46dc.xz
> mounted in repro: https://storage.googleapis.com/syzbot-assets/d03c38510d66/mount_0.gz
>   fsck result: failed (log: https://syzkaller.appspot.com/x/fsck.log?x=15e82442580000)
> 
> The issue was bisected to:
> 
> commit d2fe5c4c8d25999862d615f616aea7befdd62799
> Author: Dave Chinner <dchinner@...hat.com>
> Date:   Wed Jun 25 22:48:58 2025 +0000
> 
>     xfs: rearrange code in xfs_buf_item.c

I don't think this is the cause of the issue. It certainly exposed
the problem being tripped over and will need fixing, but there's
some underlying whackiness going on here....

> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=14bca6f0580000
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=16bca6f0580000
> console output: https://syzkaller.appspot.com/x/log.txt?x=12bca6f0580000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+4e6ee73c0ae4b6e8753f@...kaller.appspotmail.com
> Fixes: d2fe5c4c8d25 ("xfs: rearrange code in xfs_buf_item.c")
> 
> BUG: KASAN: slab-use-after-free in instrument_atomic_write include/linux/instrumented.h:82 [inline]
> BUG: KASAN: slab-use-after-free in set_bit include/asm-generic/bitops/instrumented-atomic.h:28 [inline]
> BUG: KASAN: slab-use-after-free in xlog_cil_ail_insert fs/xfs/xfs_log_cil.c:791 [inline]
> BUG: KASAN: slab-use-after-free in xlog_cil_committed+0x45e/0x1040 fs/xfs/xfs_log_cil.c:897
> Write of size 8 at addr ffff8880750cbc10 by task kworker/1:0H/25

Just before this:

[   93.113540][ T6320] loop4: detected capacity change from 32768 to 64
[   93.121392][ T6320] XFS (loop4): Metadata corruption detected at xfs_inobt_verify+0x9e/0x1f0, xfs_finobt block 0x8
[   93.136248][ T6320] XFS (loop4): Unmount and run xfs_repair
[   93.144727][ T6320] XFS (loop4): First 128 bytes of corrupted metadata buffer:
[   93.152601][ T6320] 00000000: 41 42 33 42 00 00 00 02 ff ff ff ff ff ff ff ff  AB3B............
[   93.162774][ T6320] 00000010: 00 00 00 00 00 00 00 08 00 00 00 01 00 00 00 10  ................
[   93.171779][ T6320] 00000020: c4 96 e0 5e 54 0d 4c 72 b5 91 04 d7 9d 8b 4e eb  ...^T.Lr......N.
[   93.182192][ T6320] 00000030: 00 00 00 00 c8 fc 31 e4 00 00 04 4e 00 00 00 02  ......1....N....
[   93.191959][ T6320] 00000040: 00 00 04 60 00 00 0b a0 00 00 00 00 00 00 00 00  ...`............
[   93.211058][ T6320] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   93.228793][ T6320] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   93.241303][ T6320] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   93.255636][ T6320] XFS (loop4): metadata I/O error in "xfs_btree_read_buf_block+0x290/0x470" at daddr 0x8 len 8 error 117
[   93.308796][ T5995] syz-executor: attempt to access beyond end of device
[   93.308796][ T5995] loop4: rw=432129, sector=96, nr_sectors = 16 limit=64
[   93.308796][ T5995] syz-executor: attempt to access beyond end of device
[   93.308796][ T5995] loop4: rw=432129, sector=96, nr_sectors = 16 limit=64
[   93.310419][ T3859] XFS (loop4): Metadata corruption detected at xfs_btree_lookup_get_block+0x3c5/0x500, xfs_bnobt block 0x8
[   93.330994][   T56] XFS (loop4): log I/O error -5
[   93.337979][ T3859] XFS (loop4): Unmount and run xfs_repair
[   93.344019][   T56] XFS (loop4): Filesystem has been shut down due to log error (0x2).
[   93.353457][   T56] XFS (loop4): Please unmount the filesystem and rectify the problem(s).
[   93.364934][ T5995] XFS (loop4): Unmounting Filesystem c496e05e-540d-4c72-b591-04d79d8b4eeb

So, syszbot has resized the block device under the filesystem to
almost all IO now returns -EIO. Then the journal aborts and trips
this UAF:

> CPU: 1 UID: 0 PID: 25 Comm: kworker/1:0H Not tainted 6.17.0-rc1-next-20250814-syzkaller #0 PREEMPT(full) 
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/12/2025
> Workqueue: xfs-log/loop0 xlog_ioend_work
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
>  print_address_description mm/kasan/report.c:378 [inline]
>  print_report+0xca/0x240 mm/kasan/report.c:482
>  kasan_report+0x118/0x150 mm/kasan/report.c:595
>  check_region_inline mm/kasan/generic.c:-1 [inline]
>  kasan_check_range+0x2b0/0x2c0 mm/kasan/generic.c:200
>  instrument_atomic_write include/linux/instrumented.h:82 [inline]
>  set_bit include/asm-generic/bitops/instrumented-atomic.h:28 [inline]
>  xlog_cil_ail_insert fs/xfs/xfs_log_cil.c:791 [inline]
>  xlog_cil_committed+0x45e/0x1040 fs/xfs/xfs_log_cil.c:897
>  xlog_cil_process_committed+0x15c/0x1b0 fs/xfs/xfs_log_cil.c:927
>  xlog_state_shutdown_callbacks+0x269/0x360 fs/xfs/xfs_log.c:488
>  xlog_force_shutdown+0x332/0x400 fs/xfs/xfs_log.c:3520
>  xlog_ioend_work+0xaf/0x100 fs/xfs/xfs_log.c:1245
>  process_one_work kernel/workqueue.c:3236 [inline]
>  process_scheduled_works+0xae1/0x17b0 kernel/workqueue.c:3319
>  worker_thread+0x8a0/0xda0 kernel/workqueue.c:3400
>  kthread+0x711/0x8a0 kernel/kthread.c:463
>  ret_from_fork+0x3f9/0x770 arch/x86/kernel/process.c:148
>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
>  </TASK>

Where the journal completion has tried to set the XFS_LI_ABORTED bit
on the log item before running ->iop_committed/->iop_unpin on it.
It should have a reference to the BLI at this point - it's not
dropped until ->iop_unpin() is run.

So, we have a BLI reference counting problem.

> Allocated by task 5867:
>  kasan_save_stack mm/kasan/common.c:56 [inline]
>  kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
>  unpoison_slab_object mm/kasan/common.c:339 [inline]
>  __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:365
>  kasan_slab_alloc include/linux/kasan.h:250 [inline]
>  slab_post_alloc_hook mm/slub.c:4180 [inline]
>  slab_alloc_node mm/slub.c:4229 [inline]
>  kmem_cache_alloc_noprof+0x1c1/0x3c0 mm/slub.c:4236
>  xfs_buf_item_init+0x66/0x670 fs/xfs/xfs_buf_item.c:878
>  _xfs_trans_bjoin+0x46/0x110 fs/xfs/xfs_trans_buf.c:75
>  xfs_trans_read_buf_map+0x28f/0x8e0 fs/xfs/xfs_trans_buf.c:325
>  xfs_trans_read_buf fs/xfs/xfs_trans.h:210 [inline]

Here's where the BLI was allocated, after a free space btree buffer
read during a transaction. The buffer will have already been read
from disk, so it's supposedly XBF_DONE, contains valid data and the
BLI has a single reference.

The same task then -reads the same buffer again-, and issues IO on
it again. Not sure why this happens - it should have been found in
the transaction itself and simply reused. (that's whackiness #1)

> Freed by task 5876:
>  kasan_save_stack mm/kasan/common.c:56 [inline]
>  kasan_save_track+0x3e/0x80 mm/kasan/common.c:77
>  __kasan_save_free_info+0x46/0x50 mm/kasan/generic.c:587
>  kasan_save_free_info mm/kasan/kasan.h:406 [inline]
>  poison_slab_object mm/kasan/common.c:252 [inline]
>  __kasan_slab_free+0x5b/0x80 mm/kasan/common.c:284
>  kasan_slab_free include/linux/kasan.h:233 [inline]
>  slab_free_hook mm/slub.c:2417 [inline]
>  slab_free mm/slub.c:4680 [inline]
>  kmem_cache_free+0x18f/0x400 mm/slub.c:4782
>  __xfs_buf_ioend+0x29c/0x6f0 fs/xfs/xfs_buf.c:1202
>  xfs_buf_iowait+0x167/0x480 fs/xfs/xfs_buf.c:1384
>  _xfs_buf_read fs/xfs/xfs_buf.c:646 [inline]
>  xfs_buf_read_map+0x335/0xa50 fs/xfs/xfs_buf.c:712
>  xfs_trans_read_buf_map+0x1d7/0x8e0 fs/xfs/xfs_trans_buf.c:304
>  xfs_trans_read_buf fs/xfs/xfs_trans.h:210 [inline]

And then frees the BLI during IO completion.

Say what? This code path should not be possible. (whackiness #2)

i.e. the stack trace indicates a -write IO completion- is running
on a buffer read IO, on a buffer that has supposedly already been
read, initialised and joined to the transaction.

_xfs_buf_read() does:

	bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_READ_AHEAD | XBF_DONE);
        bp->b_flags |= XBF_READ;
        xfs_buf_submit(bp);
        return xfs_buf_iowait(bp);

This is all done with the buffer locked, so nothing else should be
touching it at this point.

The code path that frees the buf item on IO completion does this:

>  __xfs_buf_ioend+0x29c/0x6f0 fs/xfs/xfs_buf.c:1202

1166 static bool
1167 __xfs_buf_ioend(
1168         struct xfs_buf  *bp)
1169 {
1170         trace_xfs_buf_iodone(bp, _RET_IP_);
1171
1172         if (bp->b_flags & XBF_READ) {
.....
1182         } else {
.....
1196                 /*
1197                  * Note that for things like remote attribute buffers, there may
1198                  * not be a buffer log item here, so processing the buffer log
1199                  * item must remain optional.
1200                  */
1201                 if (bp->b_log_item)
1202   >>>>>>>>>             xfs_buf_item_done(bp);
1203

Which clearly points out that at IO completion XBF_READ is *not set*
and so it has run write IO completion.  As a result, the BLI
attached to the buffer gets freed.

Finally, if the buffer is in the CIL and being committed, it should
be pinned, which prevents write IO from being issued. i.e.
xfs_buf_submit() does:

	if (bp->b_flags & XBF_WRITE)
                xfs_buf_wait_unpin(bp);

So we cannot have issued the IO with XBF_WRITE set....

IOWs, this failure has been triggered by something corrupting buffer
state whilst it is under IO, which then triggers a premature free of
the BLI, which then triggers a UAF when the journal tries to access
it.

The bisected commit removed this check from xfs_buf_item_relse():

-       if (atomic_read(&bip->bli_refcount))
-               return;

and that is what has exposed this whacky "write IO completion on a
read IO" behaviour.

This above check tries to takes into account the fact the AIL
doesn't hold a reference to the BLI, and so write IO is done without
references to the BLI. Hence at IO completion, it was used to check
to see if new references had been taken to the BLI whilst the IO was
in flight.

I removed this check and replaced it with an ASSERT because it
should not be possible for the AIL to do writeback of a buffer when
the BLI is referenced. Nor should it be possible to take a new BLI
reference whilst the buffer is being written. i.e.:

- the transaction modifying the buffer holds it locked (and so
  writeback cannot be started), it pins the BLI before unlocking the
  buffer and passes the BLI reference to the pin.
- items in the CIL or being committed by journal IO are pinned so
  cannot be written back. The BLI ref associated with the pin is
  dropped before the unpin wakeups are run.
- writeback is done holding the buffer lock whilst doing IO and
  waiting on an unpin wakeup, so no new BLI references can be taken
  once writeback on the buffer has been started.

Hence it should not possible for write IO to occur whilst the BLI is
referenced, and therefore it should not be possible to run write IO
completion and call xfs_buf_item_done() to free the BLI while it is
actively referenced.

Did I miss something here?

As such, I think that the issue we need to understand here is why a
write IO completion is being run from read IO context. That's the
root cause of the UAF, not the commit the bisect landed on....

-Dave.
-- 
Dave Chinner
david@...morbit.com