linux-kernel - Re: [PATCH] ocfs2: fix stale extent map cache during COW operations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <6541511c-8865-4903-b26c-2e2abf78ad99@suse.com>
Date: Sat, 11 Oct 2025 12:50:12 +0800
From: Heming Zhao <heming.zhao@...e.com>
To: Deepanshu Kartikey <kartikey406@...il.com>, joseph.qi@...ux.alibaba.com,
 mark@...heh.com, jlbec@...lplan.org
Cc: ocfs2-devel@...ts.linux.dev, linux-kernel@...r.kernel.org,
 syzbot+6fdd8fa3380730a4b22c@...kaller.appspotmail.com
Subject: Re: [PATCH] ocfs2: fix stale extent map cache during COW operations

Hi Deepanshu and Joseph,

On 10/9/25 22:29, Deepanshu Kartikey wrote:
> 
> Hi Joseph,
> 
> Thank you for the review. You are absolutely right - the cache clearing at the end of ocfs2_refcount_cow_hunk() should handle the COW path correctly.
> 
> After further investigation with the syzbot reproducer and extensive debugging, I found the real issue is in the FITRIM/move_extents code path. The bug occurs when:
> 
> 1. copy_file_range() creates a reflinked extent with flags=0x2 (OCFS2_EXT_REFCOUNTED)
> 2. ioctl(FITRIM) is called, which triggers ocfs2_move_extents()
> 3. In __ocfs2_move_extents_range(), the while loop:
>     - Calls ocfs2_get_clusters() which reads extent with flags=0x2 and caches it
>     - Then calls ocfs2_move_extent() or ocfs2_defrag_extent()
>     - Both eventually call __ocfs2_move_extent() which contains:
>         replace_rec.e_flags = ext_flags & ~OCFS2_EXT_REFCOUNTED;
>     - This clears the refcount flag and writes to disk with flags=0x0
> 4. However, the extent map cache is NOT cleared after the move operation
> 5. Cache still contains stale flags=0x2 while disk has flags=0x0
> 6. Later, when write() triggers COW, ocfs2_refcount_cal_cow_clusters() reads:
>     - From cache: flags=0x2 (stale)
>     - From disk extent tree: flags=0x0 (correct)
> 7. The mismatch triggers: BUG_ON(!(rec->e_flags & OCFS2_EXT_REFCOUNTED))
> 
> The proper fix should be in __ocfs2_move_extents_range() to clear the extent cache after each move/defrag operation completes. I will send a v2 patch with this fix.
> 
> Thanks,
> Deepanshu
> 

let's look at the syzbot page [1].
the following analysis is based on the c code from "2025/10/03 12:11" [2].
(btw, syzbot never calls __ocfs2_move_extents_range().)

The test code mainly involves 9 steps:
1. create img data and mount
2. one time open() the file (0x200000000080ul), return fd: r[0]
3. two times open the file (0x200000000280ul), return fds: r[1] r[2]
4. call ioctl F_SETFL 0 on r[1]
5. write r[2] with "0x0000000000000000" len:0xfea0ul //clean data job
6. call r[3] = dup(r[1])
7. do copy_file_range(), copy from r[1] to r[3] len=0xd8c2ul
                 //creates OCFS2_EXT_REFCOUNTED and create extent cache.
                 //check ocfs2_remap_file_range() => ocfs2_reflink_remap_extent()
8. trim r[0]
9. write on r[1] //crash.

the root cause is that, in step <9>, it calls ocfs2_refcount_cow():
- the input parameter di_bh is created by the caller via
   ocfs2_prepare_inode_for_write() => ocfs2_inode_lock_for_extent_tree() =>
   ocfs2_inode_lock_update(), which reads file data from disk.
   The extent is without OCFS2_EXT_REFCOUNTED flag because r[1] & r[2] point
   to the same file, and step <5> cleanup the file data.
- ocfs2_refcount_cow() then calls ocfs2_get_clusters to retrieve the extent
   from cache, which does contain OCFS2_EXT_REFCOUNTED (cooked by step <7>).
- this difference leads to it calling ocfs2_refcount_cow_hunk(), which
   triggers a BUG_ON().
   I suspect step <7> needs some time to write back the COW data, and syzbot
   starts step <9> too quickly before the write-back job start.

how to fix:
the v1 patch is reasonable, but the commit log needs to be revised.

for Joseph's question: (I copied here)
> At the end of ocfs2_refcount_cow_hunk(), it has:
> 
> 	/*
> 	 * truncate the extent map here since no matter whether we meet with
> 	 * any error during the action, we shouldn't trust cached extent map
> 	 * any more.
> 	 */
> 	ocfs2_extent_map_trunc(inode, cow_start);
> 
> It seems the cached extent record has already been forgotten. So how
> does the above step 3 happen?

my answer:
the crash only happens on the first call to ocfs2_refcount_cow_hunk().
ocfs2_extent_map_trunc() does the cleanup later, but the malicious
extent block is cooked before ocfs2_refcount_cow_hunk() is called.

[1] https://syzkaller.appspot.com/bug?extid=6fdd8fa3380730a4b22c
[2] https://syzkaller.appspot.com/text?tag=ReproC&x=163c9214580000

- Heming