[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9bc83901-3819-4cf1-a1ba-cc2f52f53504@redhat.com>
Date: Fri, 6 Feb 2026 14:16:13 -0500
From: Waiman Long <llong@...hat.com>
To: Al Viro <viro@...iv.linux.org.uk>, Waiman Long <llong@...hat.com>
Cc: Paul Moore <paul@...l-moore.com>, Eric Paris <eparis@...hat.com>,
Christian Brauner <brauner@...nel.org>, linux-kernel@...r.kernel.org,
audit@...r.kernel.org, Richard Guy Briggs <rgb@...hat.com>,
Ricardo Robaina <rrobaina@...hat.com>
Subject: Re: [PATCH v2] audit: Avoid excessive dput/dget in audit_context
setup and reset paths
On 2/6/26 12:22 AM, Al Viro wrote:
> On Thu, Feb 05, 2026 at 11:11:51PM -0500, Waiman Long wrote:
>
>> __latent_entropy
>> struct mnt_namespace *copy_mnt_ns(u64 flags, struct mnt_namespace *ns,
>> struct user_namespace *user_ns, struct fs_struct *new_fs)
>> {
>> :
>> if (new_fs) {
>> if (&p->mnt == new_fs->root.mnt) {
>> new_fs->root.mnt = mntget(&q->mnt);
>> rootmnt = &p->mnt;
>> }
>> if (&p->mnt == new_fs->pwd.mnt) {
>> new_fs->pwd.mnt = mntget(&q->mnt);
>> pwdmnt = &p->mnt;
>> }
>> }
>>
>> It is replacing the fs->pwd.mnt with a new one while pwd_refs is 1. I can
>> make this work with the new fs_struct field. I do have one question though.
>> Do we need to acquire write_seqlock(&new_fs->seq) if we are changing root or
>> pwd here or if the new_fs are in such a state that it will never change when
>> this copying operation is in progress?
> In all cases when we get to that point, new_fs is always a freshly
> created private copy of current->fs, not reachable from anywhere
> other than stack frames of the callers, but the proof is not pretty.
> copy_mnt_ns() is called only by create_new_namespaces() and it gets to
> copying anything if and only if CLONE_NEWNS is in the flags. So far,
> so good. The call in create_new_namespaces() is
> new_nsp->mnt_ns = copy_mnt_ns(flags, tsk->nsproxy->mnt_ns, user_ns, new_fs);
Thanks for the detailed explanation. After further investigation as to
while the pwd_refs is set, I found out the code path leading to this
situation is the unshare syscall.
__x64_sys_unshare()
=> ksys_unshare()
=> unshare_fs(unshare_flags, &new_fs)
=> unshare_nsproxy_namespaces(unshare_flags, &new_nsproxy,
new_cred, new_fs);
=> create_new_namespaces(unshare_flags, current, user_ns,
new_fs ? new_fs : current->fs);
Here, CLONE_FS isn't set in unshare_flags. So new_fs is NULL and
current->fs is passed down to create_new_namespaces(). That is why
pwd_refs can be set in this case. So it looks like the comment in
copy_mnt_ns() saying that the fs_struct is private is no longer true,
at least in this case. So changing fs_struct without taking the lock
can lead to unexpected result.
Should we add locking to make it safe?
Cheers,
Longman
Powered by blists - more mailing lists