[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <176819030053.16766.15730807505551833487@noble.neil.brown.name>
Date: Mon, 12 Jan 2026 14:58:20 +1100
From: NeilBrown <neilb@...mail.net>
To: "Bernd Schubert" <bernd@...ernd.com>
Cc: "Thorsten Leemhuis" <linux@...mhuis.info>,
"Miklos Szeredi" <miklos@...redi.hu>,
"Linux kernel regressions list" <regressions@...ts.linux.dev>,
"LKML" <linux-kernel@...r.kernel.org>,
"Linux-fsdevel" <linux-fsdevel@...r.kernel.org>,
"Christian Brauner" <brauner@...nel.org>
Subject: Re: [REGRESSION] fuse: xdg-document-portal gets stuck and causes
suspend to fail in mainline
On Mon, 12 Jan 2026, Bernd Schubert wrote:
>
> On 1/11/26 12:37, Thorsten Leemhuis wrote:
> > Lo! I can reliably get xdg-document-portal stuck on latest -mainline
> > (and -next, too; 6.18.4. works fine) trough the Signal flatpak, which
> > then causes suspend to fail:
> >
> > """
> >> [ 194.439381] PM: suspend entry (s2idle)
> >> [ 194.454708] Filesystems sync: 0.015 seconds
> >> [ 194.696767] Freezing user space processes
> >> [ 214.700978] Freezing user space processes failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0):
> >> [ 214.701143] task:xdg-document-po state:D stack:0 pid:2651 tgid:2651 ppid:1939 task_flags:0x400000 flags:0x00080002
> >> [ 214.701151] Call Trace:
> >> [ 214.701154] <TASK>
> >> [ 214.701167] __schedule+0x2b8/0x5e0
> >> [ 214.701181] schedule+0x27/0x80
> >> [ 214.701188] request_wait_answer+0xce/0x260 [fuse]
> >> [ 214.701202] ? __pfx_autoremove_wake_function+0x10/0x10
> >> [ 214.701212] __fuse_simple_request+0x120/0x340 [fuse]
> >> [ 214.701219] fuse_lookup_name+0xc3/0x210 [fuse]
> >> [ 214.701235] fuse_lookup+0x99/0x1c0 [fuse]
> >> [ 214.701242] ? srso_alias_return_thunk+0x5/0xfbef5
> >> [ 214.701247] ? fuse_dentry_init+0x23/0x50 [fuse]
> >> [ 214.701257] lookup_one_qstr_excl+0xa8/0xf0
>
> Introduced by c9ba789dad15 ("VFS: introduce start_creating_noperm() and
> start_removing_noperm()")?
>
> Why is the new code doing a lookup on an entry that is about to be
> invalidated?
>
>
> In order to handle this at least one fuse server process needs to be
> available, but for this specific case the lookup still doesn't make sense.
>
> We could do something like this
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 4b6b3d2758ff..7edbace7eddc 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -1599,6 +1599,15 @@ int fuse_reverse_inval_entry(struct fuse_conn
> *fc, u64 parent_nodeid,
> if (!dir)
> goto put_parent;
>
> + /* Check dcache first - if not cached, nothing to invalidate */
> + name->hash = full_name_hash(dir, name->name, name->len);
> + entry = d_lookup(dir, name);
> + if (!entry) {
> + err = 0;
> + dput(dir);
> + goto put_parent;
> + }
> +
> entry = start_removing_noperm(dir, name);
> dput(dir);
> if (IS_ERR(entry))
>
>
> But let's assume the dentry exists - start_removing_noperm() will now
> trigger a revalidate and get the same issue. From my point of view the
> above commit should be reverted for fuse.
>
>
> >> [ 214.701264] start_removing_noperm+0x59/0x80
> >> [ 214.701268] ? d_find_alias+0x82/0xd0
> >> [ 214.701273] fuse_reverse_inval_entry+0x7d/0x1f0 [fuse]
> >> [ 214.701280] ? fuse_copy_do+0x5f/0xa0 [fuse]
> >> [ 214.701287] fuse_notify+0x4a1/0x750 [fuse]
> >> [ 214.701295] ? iov_iter_get_pages2+0x1d/0x40
> >> [ 214.701301] ? srso_alias_return_thunk+0x5/0xfbef5
> >> [ 214.701305] fuse_dev_do_write+0x2e4/0x440 [fuse]
> >> [ 214.701313] fuse_dev_write+0x6b/0xa0 [fuse]
> >> [ 214.701320] do_iter_readv_writev+0x161/0x260
> >> [ 214.701327] vfs_writev+0x168/0x3c0
> >> [ 214.701334] ? ksys_write+0xcd/0xf0
> >> [ 214.701338] ? do_writev+0x7f/0x110
> >> [ 214.701341] do_writev+0x7f/0x110
> >> [ 214.701344] do_syscall_64+0x7e/0x6b0
> >> [ 214.701350] ? srso_alias_return_thunk+0x5/0xfbef5
> >> [ 214.701352] ? __handle_mm_fault+0x445/0x690
> >> [ 214.701359] ? srso_alias_return_thunk+0x5/0xfbef5
> >> [ 214.701363] ? srso_alias_return_thunk+0x5/0xfbef5
> >> [ 214.701365] ? count_memcg_events+0xd6/0x210
> >> [ 214.701371] ? srso_alias_return_thunk+0x5/0xfbef5
> >> [ 214.701373] ? handle_mm_fault+0x212/0x340
> >> [ 214.701377] ? srso_alias_return_thunk+0x5/0xfbef5
> >> [ 214.701379] ? do_user_addr_fault+0x2b4/0x7b0
> >> [ 214.701387] ? srso_alias_return_thunk+0x5/0xfbef5
> >> [ 214.701389] ? irqentry_exit+0x6d/0x540
> >> [ 214.701393] ? srso_alias_return_thunk+0x5/0xfbef5
> >> [ 214.701395] ? exc_page_fault+0x7e/0x1a0
> >> [ 214.701398] entry_SYSCALL_64_after_hwframe+0x76/0x7e
> >> [ 214.701402] RIP: 0033:0x7f3c144f9982
> >> [ 214.701467] RSP: 002b:00007fff80e2f388 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
> >> [ 214.701470] RAX: ffffffffffffffda RBX: 00007f3bec000cf0 RCX: 00007f3c144f9982
> >> [ 214.701472] RDX: 0000000000000003 RSI: 00007fff80e2f460 RDI: 0000000000000007
> >> [ 214.701474] RBP: 00007fff80e2f3b0 R08: 0000000000000000 R09: 0000000000000000
> >> [ 214.701475] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> >> [ 214.701477] R13: 00007f3bec000cf0 R14: 00007f3c14bb8280 R15: 00007f3be8001200
> >> [ 214.701481] </TASK>
> > """
> >
> > Killing the mentioned process using "kill -9" doesn't help. I can
> > reliably trigger this in -mainline and -next using the Signal flatpak on
> > Fedora 43 by trying to send a picture (which gets xdg-document-portal
> > involved). It works the first time, but trying again won't and will
> > cause Signal to get stuck for a few seconds. Works fine in 6.18.4.
> >
> > Is this maybe known already or does anybody have an idea what's wrong?
> > If not I guess I'll have to bisect this.
> >
> > Ciao, Thorsten
> >
> > #regzbot introduced: v6.18..
> > #regzbot title: fuse: xdg-document-portal gets stuck and causes suspend
> > to fail
> >
> >
>
> Thanks,
> Bernd
>
I post a fix
https://lore.kernel.org/all/176454037897.634289.3566631742434963788@noble.neil.brown.name/
a while ago. There was some talk in that thread of reverting the
breaking change instead. I seems nothing happened.
Christian: should I resend my patch?
Thanks,
NeilBrown
Powered by blists - more mailing lists