linux-kernel - Re: Why do very few filesystems have umount helpers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZqhV+PvVKESa3UVw@dread.disaster.area>
Date: Tue, 30 Jul 2024 12:54:48 +1000
From: Dave Chinner <david@...morbit.com>
To: Steve French <smfrench@...il.com>
Cc: Christian Brauner <brauner@...nel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	CIFS <linux-cifs@...r.kernel.org>
Subject: Re: Why do very few filesystems have umount helpers

On Mon, Jul 29, 2024 at 04:50:27PM -0500, Steve French wrote:
> On Mon, Jul 29, 2024 at 4:50 AM Christian Brauner <brauner@...nel.org> wrote:
> On Mon, Jul 29, 2024 at 12:33 PM Steve French <smfrench@...il.com> wrote:
> > > The first step should be to identify what exactly keeps your mount busy
> > > in generic/044 and generic/043.
> >
> > That is a little tricky to debug (AFAIK no easy way to tell exactly which
> > reference is preventing the VFS from proceeding with the umount and
> > calling kill_sb).  My best guess is something related to deferred close
> > (cached network file handles) that had a brief refcount on
> > something being checked by umount, but when I experimented with
> > deferred close settings that did not seem to affect the problem so
> > looking for other possible causes.
> >
> > I just did a quick experiment by adding a 1 second wait inside umount
> > and confirmed that that does fix it for those two tests when mounted to Samba,
> > but not clear why the slight delay in umount helps as there is no pending
> > network traffic at that point.
> 
> I did some more experimentation and it looks like the umount problem
> with those two xfstests to Samba is related to IOC_SHUTDOWN.
> If I return EOPNOTSUPP on IOC_SHUTDOWN
> then the 1 second delay in umount is not necessary - so something that
> happens after IOC_SHUTDOWN races with umount (thus the 1 second delay
> that I tried as a quick experiment fixes it indirectly) in this
> testcase (although
> apparently this race between IOC_SHUTDOWN and umount is not an issue
> to some other servers but is reproducible to Samba and ksmbd (at least
> in some easy to setup configurations)

So you've likely got a race condition where something takes longer
when the shutdown flag is set then when the filesystem is operating
normally. There's not a lot in the CIFS code that pays attention to
the shutdown flag - almost all of them are aborting front end
(syscall) operations before they are started.

The only back end check appears to be in cifs_issue_write(). Perhaps
that is failing to wake the request queue when it is being failed
with -EIO on a shutdown, and so it takes some time for something
else to wake it up and empty it and complete the pending writes
before the fs can be unmounted...

-Dave.
-- 
Dave Chinner
david@...morbit.com