linux-kernel - Re: [PATCH v2] audit: Avoid excessive dput/dget in audit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5cb07c57-9dca-4086-af88-f866f765c7fb@redhat.com>
Date: Fri, 6 Feb 2026 15:04:53 -0500
From: Waiman Long <llong@...hat.com>
To: Waiman Long <llong@...hat.com>, Al Viro <viro@...iv.linux.org.uk>
Cc: Paul Moore <paul@...l-moore.com>, Eric Paris <eparis@...hat.com>,
 Christian Brauner <brauner@...nel.org>, linux-kernel@...r.kernel.org,
 audit@...r.kernel.org, Richard Guy Briggs <rgb@...hat.com>,
 Ricardo Robaina <rrobaina@...hat.com>
Subject: Re: [PATCH v2] audit: Avoid excessive dput/dget in audit_context
 setup and reset paths

On 2/6/26 2:16 PM, Waiman Long wrote:
> On 2/6/26 12:22 AM, Al Viro wrote:
>> On Thu, Feb 05, 2026 at 11:11:51PM -0500, Waiman Long wrote:
>>
>>> __latent_entropy
>>> struct mnt_namespace *copy_mnt_ns(u64 flags, struct mnt_namespace *ns,
>>>                  struct user_namespace *user_ns, struct fs_struct 
>>> *new_fs)
>>> {
>>>    :
>>>                  if (new_fs) {
>>>                          if (&p->mnt == new_fs->root.mnt) {
>>>                                  new_fs->root.mnt = mntget(&q->mnt);
>>>                                  rootmnt = &p->mnt;
>>>                          }
>>>                          if (&p->mnt == new_fs->pwd.mnt) {
>>>                                  new_fs->pwd.mnt = mntget(&q->mnt);
>>>                                  pwdmnt = &p->mnt;
>>>                          }
>>>                  }
>>>
>>> It is replacing the fs->pwd.mnt with a new one while pwd_refs is 1. 
>>> I can
>>> make this work with the new fs_struct field. I do have one question 
>>> though.
>>> Do we need to acquire write_seqlock(&new_fs->seq) if we are changing 
>>> root or
>>> pwd here or if the new_fs are in such a state that it will never 
>>> change when
>>> this copying operation is in progress?
>> In all cases when we get to that point, new_fs is always a freshly
>> created private copy of current->fs, not reachable from anywhere
>> other than stack frames of the callers, but the proof is not pretty.
>> copy_mnt_ns() is called only by create_new_namespaces() and it gets to
>> copying anything if and only if CLONE_NEWNS is in the flags.  So far,
>> so good.  The call in create_new_namespaces() is
>>     new_nsp->mnt_ns = copy_mnt_ns(flags, tsk->nsproxy->mnt_ns, 
>> user_ns, new_fs);
>
> Thanks for the detailed explanation. After further investigation as to 
> while the pwd_refs is set, I found out the code path leading to this 
> situation is the unshare syscall.
>
> __x64_sys_unshare()
>  => ksys_unshare()
>   => unshare_fs(unshare_flags, &new_fs)
>   => unshare_nsproxy_namespaces(unshare_flags, &new_nsproxy,
>                                          new_cred, new_fs);
>    => create_new_namespaces(unshare_flags, current, user_ns,
>                                          new_fs ? new_fs : current->fs);
>
> Here, CLONE_FS isn't set in unshare_flags. So new_fs is NULL and
> current->fs is passed down to create_new_namespaces(). That is why
> pwd_refs can be set in this case. So it looks like the comment in
> copy_mnt_ns() saying that the fs_struct is private is no longer true,
> at least in this case. So changing fs_struct without taking the lock
> can lead to unexpected result.
>
> Should we add locking to make it safe? 

I guess if private means fs->users == 1, the condition could still be true.

Cheers,
Longman