linux-kernel - Re: [syzbot] [xfs?] WARNING in fsnotify_file_area

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOQ4uxjf5H_vj-swF7wEvUkPobEuxs2q6jfO9jFsx4pqxtJMMg@mail.gmail.com>
Date: Tue, 4 Mar 2025 16:09:16 +0100
From: Amir Goldstein <amir73il@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: syzbot <syzbot+7229071b47908b19d5b7@...kaller.appspotmail.com>, 
	akpm@...ux-foundation.org, axboe@...nel.dk, brauner@...nel.org, 
	cem@...nel.org, chandan.babu@...cle.com, djwong@...nel.org, 
	josef@...icpanda.com, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, linux-xfs@...r.kernel.org, 
	syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [xfs?] WARNING in fsnotify_file_area_perm

On Tue, Mar 4, 2025 at 12:06 PM Jan Kara <jack@...e.cz> wrote:
>
> Josef, Amir,
>
> this is indeed an interesting case:
>
> On Sun 02-03-25 08:32:30, syzbot wrote:
> > syzbot has found a reproducer for the following issue on:
> ...
> > ------------[ cut here ]------------
> > WARNING: CPU: 1 PID: 6440 at ./include/linux/fsnotify.h:145 fsnotify_file_area_perm+0x20c/0x25c include/linux/fsnotify.h:145
> > Modules linked in:
> > CPU: 1 UID: 0 PID: 6440 Comm: syz-executor370 Not tainted 6.14.0-rc4-syzkaller-ge056da87c780 #0
> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
> > pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > pc : fsnotify_file_area_perm+0x20c/0x25c include/linux/fsnotify.h:145
> > lr : fsnotify_file_area_perm+0x20c/0x25c include/linux/fsnotify.h:145
> > sp : ffff8000a42569d0
> > x29: ffff8000a42569d0 x28: ffff0000dcec1b48 x27: ffff0000d68a1708
> > x26: ffff0000d68a16c0 x25: dfff800000000000 x24: 0000000000008000
> > x23: 0000000000000001 x22: ffff8000a4256b00 x21: 0000000000001000
> > x20: 0000000000000010 x19: ffff0000d68a16c0 x18: ffff8000a42566e0
> > x17: 000000000000e388 x16: ffff800080466c24 x15: 0000000000000001
> > x14: 1fffe0001b31513c x13: 0000000000000000 x12: 0000000000000000
> > x11: 0000000000000001 x10: 0000000000ff0100 x9 : 0000000000000000
> > x8 : ffff0000c6d98000 x7 : 0000000000000000 x6 : 0000000000000000
> > x5 : 0000000000000020 x4 : 0000000000000000 x3 : 0000000000001000
> > x2 : ffff8000a4256b00 x1 : 0000000000000001 x0 : 0000000000000000
> > Call trace:
> >  fsnotify_file_area_perm+0x20c/0x25c include/linux/fsnotify.h:145 (P)
> >  filemap_fault+0x12b0/0x1518 mm/filemap.c:3509
> >  xfs_filemap_fault+0xc4/0x194 fs/xfs/xfs_file.c:1543
> >  __do_fault+0xf8/0x498 mm/memory.c:4988
> >  do_read_fault mm/memory.c:5403 [inline]
> >  do_fault mm/memory.c:5537 [inline]
> >  do_pte_missing mm/memory.c:4058 [inline]
> >  handle_pte_fault+0x3504/0x57b0 mm/memory.c:5900
> >  __handle_mm_fault mm/memory.c:6043 [inline]
> >  handle_mm_fault+0xfa8/0x188c mm/memory.c:6212
> >  do_page_fault+0x570/0x10a8 arch/arm64/mm/fault.c:690
> >  do_translation_fault+0xc4/0x114 arch/arm64/mm/fault.c:783
> >  do_mem_abort+0x74/0x200 arch/arm64/mm/fault.c:919
> >  el1_abort+0x3c/0x5c arch/arm64/kernel/entry-common.c:432
> >  el1h_64_sync_handler+0x60/0xcc arch/arm64/kernel/entry-common.c:510
> >  el1h_64_sync+0x6c/0x70 arch/arm64/kernel/entry.S:595
> >  __uaccess_mask_ptr arch/arm64/include/asm/uaccess.h:169 [inline] (P)
> >  fault_in_readable+0x168/0x310 mm/gup.c:2234 (P)
> >  fault_in_iov_iter_readable+0x1dc/0x22c lib/iov_iter.c:94
> >  iomap_write_iter fs/iomap/buffered-io.c:950 [inline]
> >  iomap_file_buffered_write+0x490/0xd54 fs/iomap/buffered-io.c:1039
> >  xfs_file_buffered_write+0x2dc/0xac8 fs/xfs/xfs_file.c:792
> >  xfs_file_write_iter+0x2c4/0x6ac fs/xfs/xfs_file.c:881
> >  new_sync_write fs/read_write.c:586 [inline]
> >  vfs_write+0x704/0xa9c fs/read_write.c:679
>
> The backtrace actually explains it all. We had a buffered write whose
> buffer was mmapped file on a filesystem with an HSM mark. Now the prefaulting
> of the buffer happens already (quite deep) under the filesystem freeze
> protection (obtained in vfs_write()) which breaks assumptions of HSM code
> and introduces potential deadlock of HSM handler in userspace with filesystem
> freezing. So we need to think how to deal with this case...

Ouch. It's like the splice mess all over again.
Except we do not really care to make this use case work with HSM
in the sense that we do not care to have to fill in the mmaped file content
in this corner case - we just need to let HSM fail the access if content is
not available.

If you remember, in one of my very early version of pre-content events,
the pre-content event (or maybe it was FAN_ACCESS_PERM itself)
carried a flag (I think it was called FAN_PRE_VFS) to communicate to
HSM service if it was safe to write to fs in the context of event handling.

At the moment, I cannot think of any elegant way out of this use case
except annotating the event from fault_in_readable() as "unsafe-for-write".
This will relax the debugging code assertion and notify the HSM service
(via an event flag) that it can ALLOW/DENY, but it cannot fill the file.
Maybe we can reuse the FAN_ACCESS_PERM event to communicate
this case to HSM service.

WDYT?

Thanks,
Amir.