lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260112-textil-bepflanzen-c6225a477747@brauner>
Date: Mon, 12 Jan 2026 10:45:49 +0100
From: Christian Brauner <brauner@...nel.org>
To: NeilBrown <neil@...wn.name>
Cc: Bernd Schubert <bernd@...ernd.com>, 
	Thorsten Leemhuis <linux@...mhuis.info>, Miklos Szeredi <miklos@...redi.hu>, 
	Linux kernel regressions list <regressions@...ts.linux.dev>, LKML <linux-kernel@...r.kernel.org>, 
	Linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: [REGRESSION] fuse: xdg-document-portal gets stuck and causes
 suspend to fail in mainline

On Mon, Jan 12, 2026 at 02:58:20PM +1100, NeilBrown wrote:
> On Mon, 12 Jan 2026, Bernd Schubert wrote:
> > 
> > On 1/11/26 12:37, Thorsten Leemhuis wrote:
> > > Lo! I can reliably get xdg-document-portal stuck on latest -mainline
> > > (and -next, too; 6.18.4. works fine) trough the Signal flatpak, which
> > > then causes suspend to fail:
> > > 
> > > """
> > >> [  194.439381] PM: suspend entry (s2idle)
> > >> [  194.454708] Filesystems sync: 0.015 seconds
> > >> [  194.696767] Freezing user space processes
> > >> [  214.700978] Freezing user space processes failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0):
> > >> [  214.701143] task:xdg-document-po state:D stack:0     pid:2651  tgid:2651  ppid:1939   task_flags:0x400000 flags:0x00080002
> > >> [  214.701151] Call Trace:
> > >> [  214.701154]  <TASK>
> > >> [  214.701167]  __schedule+0x2b8/0x5e0
> > >> [  214.701181]  schedule+0x27/0x80
> > >> [  214.701188]  request_wait_answer+0xce/0x260 [fuse]
> > >> [  214.701202]  ? __pfx_autoremove_wake_function+0x10/0x10
> > >> [  214.701212]  __fuse_simple_request+0x120/0x340 [fuse]
> > >> [  214.701219]  fuse_lookup_name+0xc3/0x210 [fuse]
> > >> [  214.701235]  fuse_lookup+0x99/0x1c0 [fuse]
> > >> [  214.701242]  ? srso_alias_return_thunk+0x5/0xfbef5
> > >> [  214.701247]  ? fuse_dentry_init+0x23/0x50 [fuse]
> > >> [  214.701257]  lookup_one_qstr_excl+0xa8/0xf0
> > 
> > Introduced by c9ba789dad15 ("VFS: introduce start_creating_noperm() and
> > start_removing_noperm()")?
> > 
> > Why is the new code doing a lookup on an entry that is about to be
> > invalidated?
> > 
> > 
> > In order to handle this at least one fuse server process needs to be
> > available, but for this specific case the lookup still doesn't make sense.
> > 
> > We could do something like this
> > 
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index 4b6b3d2758ff..7edbace7eddc 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -1599,6 +1599,15 @@ int fuse_reverse_inval_entry(struct fuse_conn
> > *fc, u64 parent_nodeid,
> >         if (!dir)
> >                 goto put_parent;
> > 
> > +       /* Check dcache first - if not cached, nothing to invalidate */
> > +       name->hash = full_name_hash(dir, name->name, name->len);
> > +       entry = d_lookup(dir, name);
> > +       if (!entry) {
> > +               err = 0;
> > +               dput(dir);
> > +               goto put_parent;
> > +       }
> > +
> >         entry = start_removing_noperm(dir, name);
> >         dput(dir);
> >         if (IS_ERR(entry))
> > 
> > 
> > But let's assume the dentry exists - start_removing_noperm() will now
> > trigger a revalidate and get the same issue. From my point of view the
> > above commit should be reverted for fuse.
> > 
> > 
> > >> [  214.701264]  start_removing_noperm+0x59/0x80
> > >> [  214.701268]  ? d_find_alias+0x82/0xd0
> > >> [  214.701273]  fuse_reverse_inval_entry+0x7d/0x1f0 [fuse]
> > >> [  214.701280]  ? fuse_copy_do+0x5f/0xa0 [fuse]
> > >> [  214.701287]  fuse_notify+0x4a1/0x750 [fuse]
> > >> [  214.701295]  ? iov_iter_get_pages2+0x1d/0x40
> > >> [  214.701301]  ? srso_alias_return_thunk+0x5/0xfbef5
> > >> [  214.701305]  fuse_dev_do_write+0x2e4/0x440 [fuse]
> > >> [  214.701313]  fuse_dev_write+0x6b/0xa0 [fuse]
> > >> [  214.701320]  do_iter_readv_writev+0x161/0x260
> > >> [  214.701327]  vfs_writev+0x168/0x3c0
> > >> [  214.701334]  ? ksys_write+0xcd/0xf0
> > >> [  214.701338]  ? do_writev+0x7f/0x110
> > >> [  214.701341]  do_writev+0x7f/0x110
> > >> [  214.701344]  do_syscall_64+0x7e/0x6b0
> > >> [  214.701350]  ? srso_alias_return_thunk+0x5/0xfbef5
> > >> [  214.701352]  ? __handle_mm_fault+0x445/0x690
> > >> [  214.701359]  ? srso_alias_return_thunk+0x5/0xfbef5
> > >> [  214.701363]  ? srso_alias_return_thunk+0x5/0xfbef5
> > >> [  214.701365]  ? count_memcg_events+0xd6/0x210
> > >> [  214.701371]  ? srso_alias_return_thunk+0x5/0xfbef5
> > >> [  214.701373]  ? handle_mm_fault+0x212/0x340
> > >> [  214.701377]  ? srso_alias_return_thunk+0x5/0xfbef5
> > >> [  214.701379]  ? do_user_addr_fault+0x2b4/0x7b0
> > >> [  214.701387]  ? srso_alias_return_thunk+0x5/0xfbef5
> > >> [  214.701389]  ? irqentry_exit+0x6d/0x540
> > >> [  214.701393]  ? srso_alias_return_thunk+0x5/0xfbef5
> > >> [  214.701395]  ? exc_page_fault+0x7e/0x1a0
> > >> [  214.701398]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > >> [  214.701402] RIP: 0033:0x7f3c144f9982
> > >> [  214.701467] RSP: 002b:00007fff80e2f388 EFLAGS: 00000246 ORIG_RAX: 0000000000000014
> > >> [  214.701470] RAX: ffffffffffffffda RBX: 00007f3bec000cf0 RCX: 00007f3c144f9982
> > >> [  214.701472] RDX: 0000000000000003 RSI: 00007fff80e2f460 RDI: 0000000000000007
> > >> [  214.701474] RBP: 00007fff80e2f3b0 R08: 0000000000000000 R09: 0000000000000000
> > >> [  214.701475] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> > >> [  214.701477] R13: 00007f3bec000cf0 R14: 00007f3c14bb8280 R15: 00007f3be8001200
> > >> [  214.701481]  </TASK>
> > > """
> > > 
> > > Killing the mentioned process using "kill -9" doesn't help. I can
> > > reliably trigger this in -mainline and -next using the Signal flatpak on
> > > Fedora 43 by trying to send a picture (which gets xdg-document-portal
> > > involved). It works the first time, but trying again won't and will
> > > cause Signal to get stuck for a few seconds. Works fine in 6.18.4.
> > > 
> > > Is this maybe known already or does anybody have an idea what's wrong?
> > > If not I guess I'll have to bisect this.
> > > 
> > > Ciao, Thorsten
> > > 
> > > #regzbot introduced: v6.18..
> > > #regzbot title: fuse: xdg-document-portal gets stuck and causes suspend
> > > to fail
> > > 
> > > 
> > 
> > Thanks,
> > Bernd
> > 
> 
> I post a fix
> 
>   https://lore.kernel.org/all/176454037897.634289.3566631742434963788@noble.neil.brown.name/
> 
> a while ago.  There was some talk in that thread of reverting the
> breaking change instead.  I seems nothing happened.

I pinged a bunch of times but nobody ever responded.
So then let's just apply your patch. I picked it up.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ