[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7e4d9eb5-6dde-4c59-8ee3-358233f082d0@virtuozzo.com>
Date: Fri, 3 Oct 2025 13:03:46 +0800
From: Pavel Tikhomirov <ptikhomirov@...tuozzo.com>
To: Al Viro <viro@...iv.linux.org.uk>,
Bhavik Sachdev <b.sachdev1904@...il.com>
Cc: Christian Brauner <brauner@...nel.org>, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, Aleksa Sarai <cyphar@...har.com>,
Jan Kara <jack@...e.cz>, John Garry <john.g.garry@...cle.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
"Darrick J . Wong" <djwong@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Ingo Molnar <mingo@...nel.org>, Andrei Vagin <avagin@...il.com>,
Alexander Mikhalitsyn <alexander@...alicyn.com>
Subject: Re: [PATCH 2/4] fs/namespace: add umounted mounts to umount_mnt_ns
On 10/3/25 00:34, Al Viro wrote:
> On Thu, Oct 02, 2025 at 06:18:38PM +0530, Bhavik Sachdev wrote:
>
>> @@ -1438,6 +1440,18 @@ static void mntput_no_expire(struct mount *mnt)
>> mnt->mnt.mnt_flags |= MNT_DOOMED;
>> rcu_read_unlock();
>>
>> + if (mnt_ns_attached(mnt)) {
>> + struct mnt_namespace *ns;
>> +
>> + move_from_ns(mnt);
>> + ns = mnt->mnt_ns;
>> + if (ns) {
>> + ns->nr_mounts--;
>> + __touch_mnt_namespace(ns);
>> + }
>> + mnt->mnt_ns = NULL;
>> + }
>
> Sorry, no. You are introducing very special locking for one namespace's rbtree.
> Not gonna fly.
>
> NAKed-by: Al Viro <viro@...iv.linux.org.uk>
Thank you for looking into it.
Sorry, we didn't have any intent to break locking, we would like to
improve/rework the patch if we can.
1) I see that I missed that __touch_mnt_namespace requires vfsmount lock
(according to it's comment) and we don't have it in this code path, so
that is one problem.
I also see that it sends wake up for mounts_poll() waiters, but since
no-one can join umount_mnt_ns, there is no point in sending wakeup as
no-one can poll on /proc/mounts for this namespace. So we can remove the
use of __touch_mnt_namespace seemingly safely and remove this incorrect
locking case.
2) Another thing is, previously when we were at this point in code we
were already out of namespace rbtree strictly before the reference from
namespace was put (namespace_unlock()->mntput()). So no-one could've
lookup-ed us, but after this change one can lookup us from umount_mnt_ns
rbtree while we are in mntput_no_expire().
This one is a hard one, in this implementation at the minimum we can end
up using the mount after it was freed due to this.
Previously mount lookup was protected by namespace_sem, and now when I
use move_from_ns out of namespace_sem this protection is broken.
One stupid idea to fix it is to leave one reference to mount from
detatched mntns, and have an asynchronous mechanism which detects last
reference (in mntput_no_expire) and (under namespace_sem) first
disconnects mount from umount_mnt_ns and only then calls final mntput.
We will think more on this one, maybe we will come up with something
smarter.
3) We had an alternative approach to make unmounted mounts mountinfo
visible for the user, by allowing fd-based statmount()
https://github.com/bsach64/linux/commit/ac0c03d44fb1e6f0745aec81079fca075e75b354
But we also recognize a problem with it that it would require getting
mountinfo from fd which is not root dentry of the mount, but any dentry
(as we (CRIU) don't really have an option to choose which fd will be
given to us).
I share this in case, maybe, you like it more than adding umount_mnt_ns.
--
Best regards, Pavel Tikhomirov
Senior Software Developer, Virtuozzo.
Powered by blists - more mailing lists