[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ff46166e-6795-4cab-bfef-d0724200bc62@bsbernd.com>
Date: Sun, 11 Jan 2026 16:32:20 +0100
From: Bernd Schubert <bernd@...ernd.com>
To: Thorsten Leemhuis <linux@...mhuis.info>,
Miklos Szeredi <miklos@...redi.hu>
Cc: Linux kernel regressions list <regressions@...ts.linux.dev>,
LKML <linux-kernel@...r.kernel.org>,
Linux-fsdevel <linux-fsdevel@...r.kernel.org>, NeilBrown <neil@...wn.name>,
Christian Brauner <brauner@...nel.org>
Subject: Re: [REGRESSION] fuse: xdg-document-portal gets stuck and causes
suspend to fail in mainline
On 1/11/26 12:37, Thorsten Leemhuis wrote:
> Lo! I can reliably get xdg-document-portal stuck on latest -mainline
> (and -next, too; 6.18.4. works fine) trough the Signal flatpak, which
> then causes suspend to fail:
>
> """
>> [ 194.439381] PM: suspend entry (s2idle)
>> [ 194.454708] Filesystems sync: 0.015 seconds
>> [ 194.696767] Freezing user space processes
>> [ 214.700978] Freezing user space processes failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0):
>> [ 214.701143] task:xdg-document-po state:D stack:0 pid:2651 tgid:2651 ppid:1939 task_flags:0x400000 flags:0x00080002
>> [ 214.701151] Call Trace:
>> [ 214.701154] <TASK>
>> [ 214.701167] __schedule+0x2b8/0x5e0
>> [ 214.701181] schedule+0x27/0x80
>> [ 214.701188] request_wait_answer+0xce/0x260 [fuse]
>> [ 214.701202] ? __pfx_autoremove_wake_function+0x10/0x10
>> [ 214.701212] __fuse_simple_request+0x120/0x340 [fuse]
>> [ 214.701219] fuse_lookup_name+0xc3/0x210 [fuse]
>> [ 214.701235] fuse_lookup+0x99/0x1c0 [fuse]
>> [ 214.701242] ? srso_alias_return_thunk+0x5/0xfbef5
>> [ 214.701247] ? fuse_dentry_init+0x23/0x50 [fuse]
>> [ 214.701257] lookup_one_qstr_excl+0xa8/0xf0
Introduced by c9ba789dad15 ("VFS: introduce start_creating_noperm() and
start_removing_noperm()")?
Why is the new code doing a lookup on an entry that is about to be
invalidated?
In order to handle this at least one fuse server process needs to be
available, but for this specific case the lookup still doesn't make sense.
We could do something like this
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 4b6b3d2758ff..7edbace7eddc 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1599,6 +1599,15 @@ int fuse_reverse_inval_entry(struct fuse_conn
*fc, u64 parent_nodeid,
if (!dir)
goto put_parent;
+ /* Check dcache first - if not cached, nothing to invalidate */
+ name->hash = full_name_hash(dir, name->name, name->len);
+ entry = d_lookup(dir, name);
+ if (!entry) {
+ err = 0;
+ dput(dir);
+ goto put_parent;
+ }
+
entry = start_removing_noperm(dir, name);
dput(dir);
if (IS_ERR(entry))
But let's assume the dentry exists - start_removing_noperm() will now
trigger a revalidate and get the same issue. From my point of view the
above commit should be reverted for fuse.
>> [ 214.701264] start_removing_noperm+0x59/0x80
>> [ 214.701268] ? d_find_alias+0x82/0xd0
>> [ 214.701273] fuse_reverse_inval_entry+0x7d/0x1f0 [fuse]
>> [ 214.701280] ? fuse_copy_do+0x5f/0xa0 [fuse]
>> [ 214.701287] fuse_notify+0x4a1/0x750 [fuse]
>> [ 214.701295] ? iov_iter_get_pages2+0x1d/0x40
>> [ 214.701301] ? srso_alias_return_thunk+0x5/0xfbef5
>> [ 214.701305] fuse_dev_do_write+0x2e4/0x440 [fuse]
>> [ 214.701313] fuse_dev_write+0x6b/0xa0 [fuse]
>> [ 214.701320] do_iter_readv_writev+0x161/0x260
>> [ 214.701327] vfs_writev+0x168/0x3c0
>> [ 214.701334] ? ksys_write+0xcd/0xf0
>> [ 214.701338] ? do_writev+0x7f/0x110
>> [ 214.701341] do_writev+0x7f/0x110
>> [ 214.701344] do_syscall_64+0x7e/0x6b0
>> [ 214.701350] ? srso_alias_return_thunk+0x5/0xfbef5
>> [ 214.701352] ? __handle_mm_fault+0x445/0x690
>> [ 214.701359] ? srso_alias_return_thunk+0x5/0xfbef5
>> [ 214.701363] ? srso_alias_return_thunk+0x5/0xfbef5
>> [ 214.701365] ? count_memcg_events+0xd6/0x210
>> [ 214.701371] ? srso_alias_return_thunk+0x5/0xfbef5
>> [ 214.701373] ? handle_mm_fault+0x212/0x340
>> [ 214.701377] ? srso_alias_return_thunk+0x5/0xfbef5
>> [ 214.701379] ? do_user_addr_fault+0x2b4/0x7b0
>> [ 214.701387] ? srso_alias_return_thunk+0x5/0xfbef5
>> [ 214.701389] ? irqentry_exit+0x6d/0x540
>> [ 214.701393] ? srso_alias_return_thunk+0x5/0xfbef5
>> [ 214.701395] ? exc_page_fault+0x7e/0x1a0
>> [ 214.701398] entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> [ 214.701402] RIP: 0033:0x7f3c144f9982
>> [ 214.701467] RSP: 002b:00007fff80e2f388 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
>> [ 214.701470] RAX: ffffffffffffffda RBX: 00007f3bec000cf0 RCX: 00007f3c144f9982
>> [ 214.701472] RDX: 0000000000000003 RSI: 00007fff80e2f460 RDI: 0000000000000007
>> [ 214.701474] RBP: 00007fff80e2f3b0 R08: 0000000000000000 R09: 0000000000000000
>> [ 214.701475] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>> [ 214.701477] R13: 00007f3bec000cf0 R14: 00007f3c14bb8280 R15: 00007f3be8001200
>> [ 214.701481] </TASK>
> """
>
> Killing the mentioned process using "kill -9" doesn't help. I can
> reliably trigger this in -mainline and -next using the Signal flatpak on
> Fedora 43 by trying to send a picture (which gets xdg-document-portal
> involved). It works the first time, but trying again won't and will
> cause Signal to get stuck for a few seconds. Works fine in 6.18.4.
>
> Is this maybe known already or does anybody have an idea what's wrong?
> If not I guess I'll have to bisect this.
>
> Ciao, Thorsten
>
> #regzbot introduced: v6.18..
> #regzbot title: fuse: xdg-document-portal gets stuck and causes suspend
> to fail
>
>
Thanks,
Bernd
Powered by blists - more mailing lists